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SIGNAL SEPARATING APPARATUS, SIGNAL SEPARATING METHOD, 
SIGNAL SEPARATING PROGRAM AND RECORDING MEDIUM 

TECHNICAL FIELD 
5 [0001] The present invention relates to the signal processing technical field 
and, in particular, to a technique for extracting a source signal from a mixture 
in which multiple source signals are mixed in a space. 

BACKGROUND ART 
10 [0002] A Beamformer (also called beamforming) is a widely-known 

■ 

conventional art of extracting a particular signal through use of multiple 
sensors arid suppressing the other signals (for example see Non-patent 
literature 1). However, the beamformer requires information about the ^ 
direction of a target signal and therefore has the drawback of being difficult to 
15 use in situations in which such information cannot be obtained (or cannot be 
estimated). 

One newer art is Blind Signal Separation (BSS) (for example see 
Non-patent literature 2). BSS is advantageous in that it does not require the 
information that the beamformer requires and is expected to find application 
20 in various situations. Signal separation using the BSS will be descried 
below. 

[0003] [Blind Signal Separation] 

First, BSS is formulated. It is assumed here that all signals are 
sampled at a certain sampling frequency f s and are discretely represented. It 
25 is also assumed that N signals are mixed and observed by M sensors. In the 
following description, a situation is dealt with in which signals are attenuated 
and delayed with the distance from the signal sources to sensors and a 
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distortion in the transmission channels can occur due to reflections of the 
signals by objects such as walls. Signals mixed in such a situation can be 
expressed, using the impulse responses h qk (r) from sources k to sensors q 
(where q is the sensor's number [q = 1, M] and k is the source's number [k 
5 =1, N]), as a convolutive mixture 
[0004] [Formula 1] 

x q (t)=Z £h qk (r)s k (t-r) ...(1) 

k=l r=0 

where t denotes the time of sampling, s k (t) denotes the source signal 
originated from a signal source at sample time t, x q (t) denotes the signal 

10 observed by a sensor q at the sampling time t, and r is a sweep variable. 

Typical impulse response h qk (r) has a strong pulsing response after a 
time lapse and then attenuates with time. The purpose of blind signal 
separation is to obtain separated signals yi(t), yN(t) ? each corresponding to 
one of the source signals Si(t), s N (t), only from observed signals 

15 (hereinafter referred to as "mixed signals") without the aid of information 
about the source signals Si(t), s N (t) and impulse responses hn(r), 
hiN(r)> h M i(r), liM^r). 
[0005] [Frequency domain] 

A process of conventional BSS will be described below. 

20 Operations for separation are performed in the frequency domain. 

Therefore, an L-point Short-Time discrete Fourier Transformation (STFT) is 
applied to the mixed signal x q (t) at a sensor q to obtain a time-series signal at 
each frequency. 
[0006] [Formula 2] 

(L/2)-l 

25 X q (f,x)= X x q (x + r)g(r)e- j27lfr ...(2) 

r=-L/2 
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Here, f is one of frequencies which are discretely sampled as f = 0, f s /L, 
f s (L - 1)/L (where f s is the sampling frequency), t is discrete time, j is an 
imaginary unit, and g(r) is a window function. The window function may be 
a window that has the center of power at g(0), such as a Harming window. 
5 [0007] [Formula 3] 



' 27tr^ 
1 + cos 



In this case, X q (f, t) represents a frequency characteristic of the mixed signals 
x q (t) centered at time t = x. It should be noted that X q (f, x) includes 
information about L samples and X q (f, x) does not need to be obtained for all 
10 x. Therefore, X q (f, x) is obtained at x with an appropriate interval. 

By performing the processing in the frequency domain, the 
convolutive mixture in the time domain expressed by Equation (1) can be 
approximated as a simple mixture at each frequency as 
[0008] [Formula 4] 

15 X q (f ,x) = ZH qk (f ) S k (f ,x) . ... (3) 

k=l 

Thus, operations for separation are simplified. Here, H^f) is the frequency 
responses of a source signal k to a sensor q and S k (f, x) is obtained by 
applying a Short- Time Discrete Fourier Transformation to the source signal 
s k (t) according to an equation similar to Equation (2). With a vector notation. 
20 Equation (3) can be written as 
[0009] [Formula 5] 

X(f,x)=ZH k (f)S k (f,x) ...(4) 

k=l 

where, X(f, x) = [Xj(f, x), ... X M (f, x)] is a mixed-signal vector, H k (f) = 

T 

[H )k (f), Hmk (01 iS th e vector consisting of frequency responses from the 
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source k to sensors. Here, [*] represents the transposed vector of [*]. 
[Signal separation using Independent Component Analysis] 

One approach to the blind signal separation is signal separation 
using Independent Component Analysis (ICA). In the approach using ICA, 
5 a separation matrix W(f) of N rows and M columns and a separated signal 
vector 

Y(f, x) = W(f) X (f, t) ... (5) 
are calculated solely from the mixed-signal vector X(f, t). Here, the 
separation matrix W(f) is calculated such that the elements (separated signals) 
10 Yi(f, x), Y N (f, x) of the separated signal vector Y(f, x) = [Yi(f, x), Y N (f, 

T • • • 

x)] are independent of each other. For this calculation, an algorithm such as 

the one described in Non-patent literature 4 may be used. 

[0010] 

In ICA, separation is made by exploiting the independence of 

15 signals. Accordingly, obtained separated signals Yi(f, x), Y N (f, x) have 
ambiguity of the order. This is because the independence of signals is 
retained even if the order of the signals changes. The order ambiguity 
problem, known as a permutation problem, is an important problem in signal 
separation in the frequency domain. The permutation problem must be 

20 solved in such a manner that the suffix p of separated signals Y p (f, x) 

corresponding to the same source signal Sk(f, t) is the same at all frequencies 
f. 

Examples of conventional approaches to solving the permutation 
problem include the one described in Non-patent literature 5. In that 
25 approach, information about the position of a signal source (the direction and 
the distance ratio) is estimated with respect to the positions of selected two 
sensors (sensor pair). The estimates at multiple sensor pairs are combined to 
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obtain more detailed positional information. These estimates as positional 
information are clustered and the estimates that belong to the same cluster are 
considered as corresponding to the same source, thereby solving the 
permutation problem. 
5 [001 1] [Signal separation using time-frequency masking] 

Another approach to blind signal separation is a method using 
time-frequency masking. This approach is a signal separation and extraction 
method effective even if the relation between the number N of sources and the 
number M of sensors is such that M < N. 

10 In this approach, the sparseness of signals is assumed. Signals are 

said to be "sparse" if they are null at most of discrete times t. The 
sparseness of signals can be observed for example in speech signals in the 
frequency domain. The assumption of the sparseness and independence of 
signals makes it possible to assume that the probability that multiple 

15 coexisting signals are observed to overlap one another at a time-frequency 

point (f, x) is low. Accordingly, it can be assumed that mixed signals at each 
time-frequency point (f, x) at each sensor consists of only one signal s p (f, x) 
that is active at that time-frequency point (f, x). Therefore, mixed-signal 
vectors are clustered by an appropriate feature quantity, a time- frequency 

20 mask M k (f, x) to be used for extracting mixed signals X(f, x) that correspond 
to the member time-frequencies (f, x) of each cluster C k , and each signal is 
separated and extracted according to 

Y k (f, x) = M k (f, x) X Q <f, x). 
Here, X Q (f, x) is one of the mixed signals and Q' e { 1, M}. 

25 [0012] The feature quantity used for the clustering may be obtained, for 
example, as follows. The phase difference between the mixed signals at two 
sensors (a sensor q and a reference sensor Q (hereinafter Q is referred to as 
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the reference value and the sensor that corresponds to the reference value Q is 
denoted as the reference sensor Q)) is calculated as 
[0013] [Formula 6] 

X a (f,x) 
(Kf,T) = Z qV ...(8) 

X Q (f,T) 

5 and, from the phase difference, Direction of Arrival (DOA) 
[0014] [Formula 7] 

0(f>t) = cos -i4(Ll)^ ... (9) 

2ti -f -d 

can be calculated as the feature quantity used for the clustering (for example 
see Non-patent literature 3). Here, "d" is the distance between sensor q and 
10 reference sensor Q and "c" is the signal transmission speed. Also, the 

k-means method (for example see Non-patent literature 6) may be used for the 
clustering. The time-frequency mask M k (f, t) used may be generated by 

calculating the average 0r,G 2 ~ 5 —fiiT of the members of each cluster C k and 
obtaining 
15 [0015] [Formula 8] 

M^f.T)-/ 1 er-A<e ( f,,)<e k ~ + A (k=1> N) 

[0 otherwise 

Here, A gives the range in which signals are extracted. In this method, as A 
is reduced, the separation and extraction performance increases but the 
nonlinear distortion increases; on the other hand, as A is increased, the 
20 nonlinear distortion decreases but the separation performance degrades. 

Another feature quantity that can be used for the clustering may be 
the phase difference between the mixed signals at two sensors (sensor q and 
reference sensor Q) (Equation (8)) or the gain ratio between the two sensors 
[0016] [Formula 9] 
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X q (f,T) 

a(f,x) = — - 

X Q (f,x) 
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20 DISCLOSURE OF THE INVENTION 

ISSUES TO BE SOLVED BY THE INVENTION 

[0017] However, the conventional art described above had a problem that 
information obtained from signals observed by multiple sensors could not 
efficiently and simply be used for signal separation. 
25 For example, a problem with the signal separation using 

independent component analysis is that it requires complicated operations to 
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accurately solve the permutation problem. That is, the conventional 
approach to solving the permutation problem estimates the direction and the 
distance ratio of each individual sensor pair. Accordingly, in order to 
accurately solve the permutation problem, estimates obtained at multiple 
5 sensors had to be combined. Furthermore, the estimates have errors. 

Therefore, sensor pairs that were likely to have less errors had to be used on a 
priority basis or the method for combining the estimates had to be designed 
such that errors in the estimates were accommodated. Another problem with 
the approach was that information about the positions of sensors had to be 

10 obtained beforehand because of the need for estimating information about the 
positions of signal sources. This is disadvantageous when sensors are 
randomly disposed. Even if sensors are regularly disposed, it is difficult to 
obtain precise positional information and therefore operations such as 
calibration must be performed in order to solve the permutation problem more 

15 accurately. 

[0018] For the conventional signal separation using time-frequency 
masking, only the methods that use two sensors have been proposed. If 
there are more than two sensors, information about only two particular 
sensors q and Q among the sensors have been used to calculate a feature 

20 quantity. This means reduction in dimensionality and therefore in the 

amount of information as compared with the case where all available sensors 
are used. Accordingly, information about all sensors was not efficiently used, 
whereby the performance was limited. To use information about all sensors 
effectively, feature quantities obtained with multiple sensor pairs can be 

25 combined as in the approach in Non-patent literatures 5, for example. 

However, in order to combine feature quantities, additional processing for 
extracting the feature quantities is required and some technique may have to 
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be used in combining them, such as selecting and using sensor pairs that are 
likely to have less errors in combining. Also this approach has a problem 
that precise information about the positions of sensors must be obtained 
beforehand. This is disadvantageous when sensors are to be positioned 
5 randomly. Even if sensors are regularly disposed, it is difficult to obtain 
precise positional information and therefore operations such as calibration 

i 

must be performed for more accurate signal extraction. 

[0019] The fundamentals of blind signal separation are to separate mixed 

signals observed by sensors and to extract multiple separated signals. 

10 However, not all the separated signals are important; only some of the 

separated signals may include a target signal. In such a case, the separated 
signals that contain the target signal must be selected. Conventional blind 
signal separation does not provide information indicating which separated 
signals include a target signal. Therefore, some other means must be used to 

15 determine which separated signals contain a target signal. 

[0020] The present invention has been made in light of these 
circumstances, and an object of the present invention is to provide a technique 
capable of simply and efficiently using information obtained from signals 

4 

observed by multiple sensors to perform signal separation. 

20 

MEANS TO SOLVE ISSUES 

[0021] According to the present invention, in order to solve the problems 
described above, first a frequency domain transforming section transforms 
mixed signals observed by multiple sensors into mixed signals in the 
25 frequency domain. Then, a normalizing section normalizes complex vectors 
generated by using the mixed signal in the frequency domain to generate 
normalized vectors excluding the frequency dependence of the complex 



-10- 

vector. A clustering section then clusters the normalized vectors to generate 
clusters. The clusters are then used for signal separation. 

The generation of the clusters does not require direct use of precise 
information about the positions of the sensors observing mixed signals as 
5 input information. Furthermore, the clusters are generated on the basis of 
information that is dependent on the position of the signal sources. Thus, 
according to the present invention, signal separation can be performed 
without using precise information about the positions of the sensors. 
[0022] According to the present invention, the normalizing section 

10 preferably includes a first normalizing section which normalizes the argument 
of each element of a complex vector on the basis of one particular element of 
the complex vector and a second normalizing section which divides the 
argument of each element normalized by the first normalizing section by a 
value proportional to the frequency. 

15 The normalized complex vectors form clusters that are dependent 

on the positions of the signal sources. Thus, signal separation can be 
performed without using precise information about the positions of the 
sensors. 
[0023] 

20 According to the present invention, the normalizing section 

preferably further includes a third normalizing section which normalizes the 
norm of a vector consisting of the elements normalized by the second 
normalizing section to a predetermined value. 

The normalized complex vectors form clusters that are dependent 

25 on the positions of the signal sources. By normalizing the norm of vector 
consisting of elements normalized by the second normalization, clustering 
operation is simplified. 



i 
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According to a preferred mode of the first aspect of the present 
invention, the frequency domain transforming section first transforms the 
mixed signals observed by multiple sensors into mixed signals in the 
frequency domain. Then, a separation matrix computing section calculates a 
5 separation matrix for each frequency by using the frequency- domain mixed 
signals and an inverse matrix computing section calculates a generalized 
inverse matrix of the separation matrix. Then, a basis vector normalizing 
section normalizes the basis vectors constituting the generalized inverse 
matrix to calculate normalized basis vectors. A clustering section then 

10 clusters the normalized basis vectors into clusters. Then, a permutation 

computing section uses the center vectors of the clusters and the normalized 
basis vectors to calculate a permutation for sorting the elements of the 
separation matrix. It should be noted that the notion of a basis vector is 
included in the notion of that of a complex vector. 

15 [0024] According to the first aspect of the present invention, basis vectors 
are normalized and then clustered to calculate a permutation for solving a 
permutation problem. Therefore, information about the positions of sensors 
does not need to be obtained beforehand for the clustering. According to a 
preferred mode of the present invention, all elements of normalized basis 

20 vectors are subjected to being clustered to calculate a permutation for solving 
a permutation problem. Therefore, unlike the conventional art, operations 
for combining the results of estimation are not required. 

In the first aspect of the present invention, more preferably the basis 
vector normalizing section normalizes the basis vector to eliminate its 

25 frequency dependence. More preferably, the normalization for eliminating 
frequency dependence of the basis vector is achieved by normalizing the 
argument of each element of the basis vector on the basis of one particular 
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element of the basis vector and dividing the argument of each element by a 
value proportional to the frequency. By this normalization, clusters that are 
dependent on the positions of signal sources can be generated. 
[0025] In the first aspect of the present invention, the normalization that 
5 eliminates frequency dependence of the basis vector is performed more 
preferably by calculating 
[0026] [Formula 10] 

AqpWHAqpCOIexp 

for each element A qp (f) (where q = 1, M and M is the number of sensors 
10 that observe mixed signals) of the basis vector A p (f) (where p = 1, N and N 
is the number of signal sources). Here, "exp" is Napier's number, arg[-] is an 
argument, "f 1 is the frequency, "j" is an imaginary unit, "c" is a signal 
transmission speed, "Q" is a reference value selected from the natural 
numbers less than or equal to M, and "d" is a real number. That is, the 
15 normalization performed by calculating Equation (10) normalizes the 

argument of each element of a basis vector by using one particular element of 
the basis vector as the reference and dividing the argument of each element by 
a value proportional to the frequency. This normalization eliminates 

* 

dependence on frequencies. Furthermore, the normalization does not need 

20 precise information about the positions of sensors. 

[0027] The real number "d" in Equation (10) is preferably the maximum 
distance d max between the reference censor Q corresponding to the element 
A Qp (f) and another sensor because this typically improves the accuracy of the 
clustering. The reason will be detailed later. 

25 In the first aspect of the present invention, a basis vector is 

normalized to a frequency-independent frequency-normalized vector and this 



J 



.arg[A qp (f)/A Qp (f)] 



4fc _1 d 



...(10) 
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frequency-normalized vector is then normalized to a normalized basis vector 
whose norm has a predetermined value. The normalized basis vector 
generated by the two-step normalization is independent of frequencies and 
dependent only on the positions of signal sources. It should be noted that the 
5 norm normalization simplifies clustering operation. 

[0028] In the first aspect of the present invention, preferably a permutation 
is calculated by using the envelope of separated signals (the envelope of the 
absolute values of separated signals), central vectors of clusters, and 
normalized basis vectors. Thus, a permutation problem can be solved more 
10 accurately. 

According to a preferable second aspect of the present invention, a 
frequency domain transforming section transforms mixed signals observed by 
multiple sensors into mixed signal in the frequency domain and a signal 
separating section calculates a separation matrix and separated signals for 

15 each frequency by using the frequency-domain mixed signals. Then, a target 
signal selecting section selects selection signals including a target signal from 
among the separated signals. In this procedure, basis vectors which are 
columns of the generalized inverse matrix of the separation matrix are 
normalized, the normalized basis vectors are clustered, and selection signals 

20 are selected by using the variance of the clusters as the indicator. If the 
separation matrix is a square matrix, its generalized inverse matrix is 
equivalent to its inverse matrix. That is, the notion of generalized inverse 
matrix includes ordinary inverse matrices. 

[0029] By using the variance of clusters as the indicator, a signal nearer a 
25 sensor can be located as a target signal and separated signals including the 
target signal can be selected as selection signals. The reason will be 
described below. The normalization of basis vectors is performed such that 
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normalized basis vectors form clusters that are dependent only on the 
positions of signal sources in a given model (for example a near- field model) 
that is an approximation of a convolutive mixture of signals originated from 
multiple signal sources. However, there are various factors in a real 
5 environment that are not reflected in such a model. For example, 

transmission distortions of signals caused as they are reflected by objects such 
as walls are not reflected in a near- field model. Such a discrepancy between 
a real environment and a model increase as the distance from a signal source 
to the sensors increase; signals nearer to the sensors exhibits smaller 

10 discrepancy. Accordingly, signals nearer to the sensors can be normalized 
under conditions closer to those in a real environment and therefore the 
variance of clusters caused by discrepancies between the real environment 
and a model can be smaller. Based on the realization of this relation, a 
preferred mode of the second aspect of the present invention extracts selection 

15 signals including a target signal closer to the sensors by using the variance of 
clusters as the indicator. The above operation can extract a target signal and 
suppress other interfering signals to some extent. 

[0030] However, if a separation matrix and separated signals are calculated 
by using Independent Component Analysis (ICA), the number of interfering 

20 signals that can be completely suppressed by the above process is equal to the 
number of sensors minus 1 at most. If there are more interfering signals, 
unsuppressed interfering signal components will remain. Therefore, 
according to the present invention, preferably a mask generating section 
generates a time-frequency mask by using frequency-domain mixed signals 

25 and basis vectors, and a masking section applies the time-frequency mask to 
selected selection signals. Thus, interfering signals remaining in the 
selection signals can be better suppressed even if the number of signal sources 
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is larger than that of the sensors. T 
[003 1] In the second aspect of the present invention, the mask generating 
section preferably generates a whitening matrix by using the 
frequency-domain mixed signals, uses the whitening matrix to transform a 
5 mixed-signal vector consisting of the frequency-domain mixed signals to a 
whitened mixed-signal vector and transform the basis vectors to a whitened 
basis vectors, then calculates the angle between the whitened mixed-signal 
vector and the whitened-basis vector at each time-frequency, and generates a 
time-frequency mask. by using a function including the angle as an element. 
10 By applying the time-frequency mask to selection signals, interfering signals 
remaining in the selection signals can be suppressed. 

[0032] In the second aspect of the present invention, the whitening matrix 
is preferably V(f) = R(f)" 1/2 , where R(f) = < X(f, x>X(f, t) h > x , f is a 
frequency, x is discrete time, X(f, x) is a mixed-signal vector, < * > x is a time 

15 average vector of a vector "*", and * H is a complex conjugate transposed 
vector of the vector "*" (a vector obtained by transposing the complex 
conjugate of the elements of the vector). Then, a whitened mixed-signal 
vector Z(f, x) is calculated as Z(f, x) = V(f)-X(f, x) and whitened basis vector 
B(f) is calculated as B(f) = V(f> A(f), where A(f) is a basis vector. The angle 

20 0(f, x) is calculated as 0(f, x) = cos -, (|B H (f>Z(f, x)|/ 1| B(f) || • || Z(f, x) || , where 
|*| is the absolute value of a vector "*" and || * || is the norm of the vector "*". 
A logistic function M(9(f, x)) - a/( 1 +e g (0(f!T) " 9T) ) is calculated as a 
time-frequency mask, where a, g, and 0 T are real numbers. The 
time- frequency mask can be applied to extracted selection signals to further 

25 suppress interfering signals remaining in the selection signals. 

[0033] In the second aspect of the present invention, the target signal 
selecting section preferably performs normalization that eliminates frequency 
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dependence from a basis vector. In the second aspect of the present 
invention, the normalization that eliminates frequency dependence from a 
basis vector more preferably normalizes the argument of each element of the 
basis vector by using one particular element of the basis vector as the 
5 reference and divides the argument of each element by a value proportional to 
the frequency. In the second aspect of the present invention, the 
normalization that eliminates frequency dependence of a basis vector is 
performed preferably by calculating 
[0034] [Formula 11] 

10 AqpWHAqpCOIexp 

for each element A qp (f) (where q = 1, ... and M is the number of sensors 
observing mixed signals) of the basis vector A p (f) (where p is a natural 
number). Here, exp is Napier's number, arg[-] is an argument, f is the 
frequency, j is an imaginary unit, c is signal transmission speed, Q is a 

15 reference value selected from the natural numbers less than or equal to M, and 
"d" is a real number. As a result of this normalization, the normalized basis 
vectors form clusters that are dependent only on the positions of signal 
sources in a given model which is an approximation of a convolutive mixture 
of signals originated from the multiple signal sources. Consequently, 

20 separated signals including a target signal can be selected by using the 

magnitude of variance of clusters as the indicator as described above. The 
normalization does not require precise information about the positions of 
sensors. 

[0035] The real number "d" in the above described Equation (1 1) is 
25 preferably the maximum distance d max between a reference sensor Q and 
another sensor because this typically improves the accuracy of clustering. 



J 



.arg[A qp (f)/A Qp (f)] 



4fc _1 d 



... (11) 
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The reason will be detailed later. 

In the second aspect of the present invention, the target signal 
selecting section selects a cluster that yields the minimum variance and 
selects separated signals corresponding to the selected cluster as the selected 
5 signals including a target signal. Thus, the signal that has the smallest 

discrepancy from the model (for example the signal nearest a sensor) can be 
selected as the target signal. 

[0036] In a preferable third aspect of the present invention, first a 
frequency domain transforming section transforms mixed signals observed by 

10 multiple sensors into mixed signals in the frequency domain. Then, a vector 
normalizing section normalizes a mixed-signal vector consisting of the 
frequency-domain mixed signals to obtain a normalized vector. Then, a 
clustering section clusters the normalized vectors to generate clusters. Then, 
a separated signal generating section extracts a element of a mixed-signal 

15 vector corresponding to the time-frequency of the normalized vector 

belonging to the k-th cluster and generates a separated signal vector having 
the element as its k-th element. 

[0037] In the third aspect of the present invention, mixed signals observed 
by all sensors are normalized and clustered, and information about each 

20 cluster is used to generate a separated signal vector. This means that the 
separated signals are extracted by using information about all sensors at a 
time. This processing does not need precise information about the positions 
of sensors. Thus, according to the third aspect of the present invention, 
signal separation can be performed by using information obtained from all of 

25 the observed signals in a simple and efficient manner without needing precise 
information about the positions of sensors. 

In the third aspect of the present invention, the vector normalizing 
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section preferably performs normalization that eliminates frequency 
dependence from a mixed-signal vector consisting of the frequency-domain 
mixed signals. More preferably, the normalization that eliminates frequency 
dependence from a mixed-signal vector has a normalization of the argument 
5 of each element of the basis vector by using one particular element of the 
mixed-signal vector as the reference and a division of the argument of each 
element by a value proportional to the frequency. More preferably, the 
normalization that eliminates frequency dependence from the mixed signal 
vector is performed by calculating 
10 [0038] [Formula 12] 



X Q '(f,T) =| X Q (f,x)|exp 



, arg[X q (f,x)/X Q (f,x)] 
J 4fc _1 d 



... (12) 



for each element X q (f, x) (where q = 1, M and M is the number of sensors 
observing mixed signals) of the mixed-signal vector. Here, exp is Napier's 
number, arg[-] is an argument, j is an imaginary number, c is signal 

15 transmission speed, Q is a value selected from the natural numbers less than 
or equal to Q, d is a real number, f is a frequency, and x is discrete time. 
Thus, frequency dependence can be eliminated. Consequently, clusters 
dependent on the positions of signal sources can be formed. It should be 
noted that this normalization does not require precise information about the 

20 positions of sensors. 

[0039] The real number "d M in the above described Equation (12) is 
preferably the maximum distance d max between the sensor corresponding to 
element X Q (f, x) and another sensor because the precision of clustering is 
typically improved by this. The reason will be detailed later. 

25 In the third aspect of the present invention, the vector normalizing 

section preferably performs normalization that eliminates frequency 
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dependence from a mixed-signal vector and normalization that normalizes its 
norm to a predetermined value. This simplifies clustering operations. 

EFFECTS OF THE INVENTION 
5 [0040] As has been described, according to the present invention, 

information obtained from signals observed by multiple sensors can be used 
in a simple and efficient manner to perform signal separation. 

For example, according to the first aspect of the present invention, 
the permutation problem can be solved accurately without needing to obtain 

10 information about the precise sensor positions beforehand or to perform 
complicated operations. According to the second aspect of the present 
invention, a target signal can be extracted from mixed signals which are a 
mixture of signals originated from multiple sources (even if N > M), without 
information about the direction of the target signal. According to the third 

15 aspect of the present invention, information obtained from all signals 
observed can be used in a simple and efficient manner to^erform signal 
separation (even if N > M), without needing precise information about sensor 
positions. 

20 BRIEF DESCRIPTION OF THE DRAWINGS 

[0041] Fig. 1 is a block diagram illustrating a functional configuration of a 
signal separating apparatus including the principles of the present invention; 

Fig. 2 is a block diagram illustrating a hardware configuration of a 
signal separating apparatus according to a first embodiment; 
25 Fig. 3 illustrates a block diagram of the signal separating apparatus 

according to the first embodiment; 

Fig. 4A is a block diagram illustrating details of a permutation 
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problem solving section shown in Fig. 3; Fig. 4B is a block diagram 
illustrating details of a basis vector normalizing section shown in Fig. 4A; 

Fig. 5 is a flowchart outlining a whole process performed in the 
signal separating apparatus according to the first embodiment; 
5 Fig. 6 is a flowchart for describing details of a normalization 

process according to the first embodiment; 

Fig. 7 A is a complex plane used for illustrating the relation between 
an element A qp "(f) of a normalized basis vector for each value of parameter 
"d" and the element's argument arg[A qp M (f)] when d max /2 > d; Fig. 7B is a 
10 complex plane used for illustrating the relation between an element A qp f! (f) of 
a normalized basis vector for each value of parameter d and the element's 
argument arg[A qp "(f)] when d max /2 < d < d max ; 

Fig. 8A is a complex plane used for illustrating the relation between 
an element A qp "(f) of a normalized basis vector for each value of parameter 
15 "d" and the element's argument arg[A qp "(f)] when d = d max ; Fig. 8B is a 

complex plane used for illustrating the relation between an element A qp "(f) of 
a normalized basis vector for each value of parameter fl d ft and the element's 
argument argfA^'^f)] when d > d max ; 

Fig. 9 is a block diagram illustrating a signal separating apparatus 
20 according to a second embodiment; 

Fig. 10A is a block diagram illustrating details of a permutation 
problem solving section shown in Fig. 9; Fig. 1 OB is a block diagram 
illustrating details of a permutation correcting section shown in Fig. 10A; 

Fig. 1 1 is a flowchart outlining a whole process performed in the 
25 signal separating apparatus according to the second embodiment; 

Fig. 12 is a flowchart illustrating an example of step S58 in Fig. 11; 

Fig. 13 is a flowchart illustrating an example of step S58 in Fig. 11; 
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Fig. 14A shows conditions of sound source separation experiments 
according to the first and second embodiments; Fig. 14B shows the results of 
the sound source separation experiments according to the first and second 
embodiments; 

Fig. 15A shows conditions of sound source separation experiments 
according to the first and second embodiments; Fig. 1 5B shows the results of 
the sound source separation experiments according to the first and second 
embodiments; 

Fig. 1 6 is a block diagram illustrating a configuration of a signal 
separating apparatus according to a third embodiment; 

Fig. 17A is a block diagram illustrating a detailed configuration of a 
target signal selecting section in Fig. 16; Fig. 17B is a block diagram 
illustrating a detailed configuration of a basis vector clustering section in Fig. 
17 A; 

Fig. 1 8 A is a block diagram illustrating a detailed configuration of a 
time-frequency masking section in Fig. 16; Fig. 18B is a block diagram 
illustrating details of a mask generating section in Fig. 18 A; 

Fig. 19 is a flowchart outlining a whole signal separation process 
according to the third embodiment; 

Fig. 20 is a flowchart illustrating details of processing in a target 
signal selection section according to the third embodiment; 

Fig. 21 A is a flowchart illustrating details of frequency 
normalization at step SI 12; Fig. 2 IB is a flowchart illustrating details of norm 
normalization at step SI 13; 

Fig. 22 is a flowchart illustrating details of a process for selecting a 
selection signal (step SI 15); 

Fig. 23 is a flowchart illustrating details of step SI 04 in Fig. 19; 
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Fig. 24 A illustrates time- frequency masks M(f, x) calculated for two 
real-number parameters 0 T , g according to Equation (46); Fig. 24B shows 
coexistence of a vector V(f>Hj(f) corresponding to a target signal (1(f) =1) 
with a vector V(f>H 2 (f), V(f) V(f)-H 3 (f) that correspond to interfering signals 
5 at a certain time- frequency position (f, x); 

Fig. 25 is a block diagram illustrating a signal separating apparatus 
according to a fourth embodiment; 

Fig. 26 is a flowchart illustrating a process performed in the signal 
separating apparatus according to the fourth embodiment; 
10 Fig. 27 is a block diagram illustrating a signal separating apparatus 

according to a fifth embodiment; 

Fig. 28A is a block diagram showing a detailed configuration of a 
time-frequency masking section in Fig. 27; Fig. 28B is a block diagram 
showing a detailed configuration of a mask generating section in Fig. 28A; 
15 Fig. 29 is a flowchart illustrating a process for generating a 

time-frequency mask according to a fifth embodiment; 

Fig. 3 OA is a flowchart illustrating details of step SI 71 in Fig. 29; 
Fig. 30B is a flowchart illustrating details of step SI 72 in Fig. 29; 

Fig. 31 A shows conditions of experiments for demonstrating effects 
20 of the third and fourth embodiments; Fig. 3 IB is a table showing average 
improvements in SIR when only ICA is used (the fourth embodiment) and 
when time-frequency masking is used in combination with ICA (the third 
embodiment); 

Fig. 32 is a block diagram illustrating a signal separating apparatus 
25 according to a sixth embodiment; 

Fig. 33 is a block diagram illustrating details of a signal separating 
section in Fig. 32; 
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Fig. 34 is a flowchart outlining a whole process performed in the 
signal separating apparatus according to the sixth embodiment; 

Fig. 35 A is a flowchart illustrating details of processing at step 
S202 shown in Fig. 34; Fig. 35B is a flowchart illustrating details of 
5 processing at step S203 shown in Fig. 34; 

Fig. 36 is a flowchart illustrating details of processing at step S205 
shown in Fig. 34; 

Fig. 37A is a complex plane used for illustrating the relation 
between an element X q "(f, x) of a norm-normalized vector at each value of 
10 parameter "d" and its argument arg[X q " (f, x)] when d max /2 > d; Fig. 37B is a 
complex plane used for illustrating the relation between an element ^"(f, x) 
of a norm-normalized vector X"(f, x) at each value of parameter "d" and its 
argument arg[X q " (f, x)] when d max /2 < d < d max ; 

Fig. 38A is a complex plane used for illustrating the relation 
15 between an element X q "(f, x) of a norm-normalized vector at each value of 
parameter "d" and its argument arg[X q " (f, x)] when d = d max ; Fig. 38B is a 
complex plane used for illustrating the relation between an element X^f, x) 
of a norm-normalized vector X"(f, x) at each value of parameter "d" and its 
argument arg[X q " (f, x)] when d > d max ; 
20 Fig. 39 A shows conditions of sound source separation experiments 

according to the sixth embodiment; Fig. 39B shows results of the sound 
source separation experiments according to the sixth embodiment; 

Fig. 40A shows conditions of sound source separation experiments 
according to the sixth embodiment; Fig. 40B shows results of the sound 
25 source separation experiments according to the sixth embodiment; and 

Fig. 41 A shows conditions of sound source separation experiments 
according to the sixth embodiment; Fig. 4 IB shows results of the sound 
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source separation experiments according to the sixth embodiment. 
Description of symbols 

[0042] 1, 10, 200, 1001, 1200, 1300, 2001: Signal separating apparatus 

5 

BEST MODES FOR CARRYING OUT THE INVENTION 

[0043] Embodiments of the present invention will be described below with 

reference to the accompanying drawings. 

[Principles] 

10 The principles of the present invention will be described first. 

Fig. 1 is a block diagram illustrating a functional configuration of a 
signal separating apparatus 1 incorporating principles of the present invention. 
The signal separating apparatus 1 may be configured on a computer of 
well-known von Neumann-type by causing the computer to execute a 

15 predetermined program as will be described later. 

The signal separating apparatus 1 separates a mixture of source 
signals originated from multiple signal sources into the source signals. As 
shown in Fig. 1 , the signal separating apparatus 1 has a frequency domain 
transforming section 2, a complex vector generating section 3, a normalizing 

20 section 4, and a clustering section 5. The normalizing section 4 includes a 
first normalizing section 4a which normalizes the argument of each element 
of a complex vector by using one particular element of that complex vector as 
the reference, a second normalizing section 4b which divides the argument of 
each element normalized by the first normalizing section 4a by a value 

25 proportional to a frequency, and a third normalizing section 4c which 

normalizes the norm of a vector consisting of the elements normalized by the 
second normalizing section 4b to a predetermined value. The first and 
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second normalizing sections 4a and 4b eliminate the frequency dependence of 
complex vectors (frequency normalization). 

[0044] When signal separation is performed by the signal separating 
apparatus 1, mixed signals (signals in the time domain) observed by multiple 
5 sensors are first inputted in the frequency domain transforming section 2. 

The frequency domain transforming section 2 uses transformation such as the 
Short-Time discrete Fourier Transformation (STFT) to transform the mixed 
signals (signals in the time domain) observed by the multiple sensors into 
mixed signals in the frequency domain. Then, the complex vector 

10 generating section 3 uses the mixed signals in the frequency domain to 

generate a complex vector consisting of complex-number elements. The 
normalizing section 4 then normalizes the complex vector to generate a 
normalized vector excluding the frequency dependence of the complex vector. 
[0045] In the normalization in the example in Fig. 1, the first normalizing 

15 section 4a first normalizes the argument of each element of a complex vector 
at each time-frequency by using one particular element of that complex vector 
as the reference. As a result, the argument of each element of the complex 
vector will depend only on the relative position of the signal source with 
respect to sensors and on the frequency of the signal source without 

20 depending on the phase and amplitude of the source signal (details will be 
described later). Then, the second normalizing section 4b divides the 
argument of each element normalized by the first normalizing section 4a by a 
value proportional to the frequency. As a result, the frequency dependence 
of the elements of each complex vector is eliminated and the complex vector 

25 is normalized to a vector that is dependent only on the relative position of 
each signal source with respect to each sensor. Then, the third normalizing 
section 4c normalizes the norm of the vector consisting of the elements 
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normalized by the second normalizing section 4b to a predetermined number. 
[0046] Then, the clustering section 5 clusters the vectors thus normalized 
into clusters. These clusters are dependent only on the relative positions of 
the signal sources with respect to the sensors. The separated signal 
5 generating section 6 uses the clusters to perform any of various types of signal 
separation to generate separated signals in the frequency domain. Finally, 
time domain transforming section transforms the separated signals in the 
frequency domain into separated signal in the time domain. 

As has been described, the generation of the clusters does not 

10 require obtaining precise information about the positions of the sensors 

beforehand. Furthermore, information about signals observed at all sensors 
is used for generating the clusters. That is, according to the present 
invention, information obtained from signals observed by multiple sensors 
can be used in a simple and efficient manner to perform signal separation. 

15 [0047] It is possible to generate clusters that are dependent only on the 
relative positions of signal sources with respect to sensors by clustering with 
some additional arrangements without normalizing the norm. However, in 
order to simplify clustering, it is preferable to normalize the norm by the third 
normalizing section 4c. 

20 Embodiments of the present invention will be described below. 

[First embodiment (example of the first aspect of the present invention)] 

The first embodiment of the present invention will be described. 
[0048] The first embodiment accurately solves the permutation problem in 
accordance with the principles described above, without needing to obtain 

25 precise information about sensor positions beforehand or to perform 

complicated operations. It should be noted that "basis vectors" described 
later correspond to the "complex vectors" mentioned above. 
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<Hardware configuration> 

Fig. 2 is a block diagram showing a hardware configuration of a 
signal separating apparatus 1 0 according to the first embodiment. 

As shown in Fig. 2, the signal separating apparatus 10 in this 
5 example includes a CPU (Central Processing Unit) 10a, an input unit 10b, an 
output unit 10c, an auxiliary storage device lOf, a RAM (Random Access 
Memory) lOd, a ROM (Read Only Memory) lOe, and a bus lOg. 
[0049] The CPU 10a in this example includes a control section lOaa, a 
processing section lOab, and a register lOac and performs various operations 

10 in accordance with programs read in the register lOac. The input unit 10b in 
this example may be an input port, keyboard, or mouse through which data is 
inputted; the output unit 10c may be an output port or display through which 
data is outputted. The auxiliary storage 1 Of, which may be a hard disk, MO 
(Magneto-Optical disc), or semiconductor memory, has a signal separating 

15 program area lOfa which stores a signal separating program for executing 
signal separation of the first embodiment and a data area lOfb which stores 
various kinds of data such as time-domain mixed-signals observed by sensors. 
The RAM lOd, which may be an SRAM (Static Random Access Memory), or 
DRAM (Dynamic Random Access Memory), has a signal separating program 

20 area lOda in which the signal separating program is written and a data area 
lOdb in which various kinds of data are written. The bus lOg in this 
example interconnects the CPU 10a, input unit 10b, output unit 10c, auxiliary 
storage device lOf, RAM lOd, and ROM lOe in such a manner that they can 
communicate with one another. 

25 [0050] <Cooperation between hardware and software> 

The CPU 10a in this example writes the signal separating program 
stored in the signal separating program area lOfa in the auxiliary storage 
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device lOf into the signal separating program area lOdb in the RAM lOd in 
accordance with a read OS (Operating System) program. Similarly, the CPU 
10a writes various kinds of data such as time-domain mixed-signals stored in 
the data area lOfb in the auxiliary storage device lOf into the data area lOdb 
5 in the RAM lOd. The CPU 10a also stores in the register lOac the addresses 
on the RAM lOd at which the signal separating program and the data are 
written. The control section lOaa in the CPU 10a sequentially reads the 
addresses stored in the register 1 Oac, reads the program and data from the 
areas on the RAM lOd indicated by the read addresses, causes the processing 
10 section lOab to sequentially execute operations described in the program, and 
stores the results of the operations in the register lOac. 

[0051] Fig. 3 is a block diagram showing a signal separating apparatus 10 
configured by the signal separating program being read by the CPU 10a. Fig. 
4A is a block diagram illustrating details of the permutation problem solving 

15 section 140 shown in Fig. 3; and Fig. 4B is a block diagram illustrating details 
of the basis vector normalizing section 142 shown in Fig. 4A. 

As shown in Fig. 3, the signal separating apparatus 10 includes a 
memory 100, a frequency domain transforming section 120, a separation 
matrix computing section 130, a permutation problem solving section 140, a 

20 separated signal generating section 150, a time domain transforming section 
160, and a control section 170. The permutation problem solving section 
140 in this example has an inverse matrix computing section 141 (which 
corresponds to the "complex vector generating section 1 '), a basis vector 
normalizing section 142 (which corresponds to the "normalizing section"), a 

25 clustering section 143, a permutation computing section 144, and a sorting 
section 145. The basis vector normalizing section 142 has a frequency 
normalizing section 142a and a norm normalizing section 142b. The 
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frequency normalizing section 142a includes a first normalizing section 142aa 
and a second normalizing section 142ab. The control section 170 has a 
temporary memory 171. 

[0052] The memory 100 and the temporary memory 171 correspond to the 
5 register lOab, the data area lOfb in the auxiliary storage device lOf or the data 
area lOdb in the RAM lOd. The frequency domain transforming section 120, 
the separation matrix computing section 130, the permutation problem 
solving section 140, the separated signal generating section 150, the time 
domain transforming section 160, and the control section 170 are configured 

10 by the OS program and the signal separating program read by the CPU 10a. 

The dashed arrows in Figs. 3 and 4 represent theoretical information 
flows whereas the solid arrows represent actual data flows. Arrows 
representing data flows to and from the control section 1 70 are omitted from 
Figs. 3 and 4. Arrows representing actual data flows are also omitted from 

15 Fig. 4. 

[0053] <Processing> 

Processing performed in the signal separating apparatus 1 0 
according to the first embodiment will be described below. In the following 
description, a situation will be dealt with in which N source signals are mixed 

20 and observed by M sensors. It is assumed that mixed signals Xq(t) (q = 1, 
M) in the time domain observed by sensors are stored in memory area 1 0 1 in 
the memory 100 and parameters, namely, the signal transmission speed c, a 
reference value Q (a suffix representing one reference sensor selected from 
among M sensors) chosen from natural numbers smaller than or equal to M, 

25 and a real number "d", are stored in a memory area 107 in preprocessing. 
[0054] Fig. 5 is a flowchart outlining a whole process performed in the 
signal separating apparatus 10 according to the first embodiment. With 
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reference to Fig. 5, processing performed in the signal separating apparatus in 

this embodiment 10 will be described below. 

[Processing by frequency domain transforming section 120] 

First, the frequency domain transforming section 120 reads mixed 
5 signals X q (t) in the time domain from storage area 101 of the memory 100, 
transforms them into time-series signals at each frequency (which are referred 
to as "frequency- domain mixed signals") X q (f, t) (q = 1, M) by using a 
transform such as Short-Time discrete Fourier Transformation, and stores 
them in memory area 102 of the memory 100 (step SI). 
10 [0055] [Processing by the separation matrix computing section 130] 

Then, the separation matrix computing section 130 reads the 
frequency-domain mixed signals X q (f, x) from memory area 102 of the 
memory 100. After reading the frequency-domain mixed signals Xq(f, x), 
the separation matrix computing section 130 uses a mixed-signal vector X(f, 

T 

15 x) = [Xi (f, x), X M (f, x)] consisting of those signals to perform 

Independent Component Analysis (ICA) to calculate a first separation matrix 
W(f) and separated signal vectors Y(f, x) = [Y^f, x), Y N (f, x)] T . The 
calculated first separation matrix W(f) is stored in memory area 103 in the 
memory 100 (step S2). 

20 [0056] Here, the first separation matrix W(f) calculated by the separation 
matrix computing section 130 includes ambiguity of the order. Therefore, 
the permutation problem solving section 140 resolves the ambiguity of the 
order of the first separation matrix W(f) to obtain a second separation signal 
W(f). 

25 [Processing by the permutation problem solving section 140] 

First, the inverse matrix computing section 141 reads the first 
separation matrix W(f) from memory area 103 of the memory 100, calculates 
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the Moore-Penrose generalized inverse matrix W*(f) = [A^f), A N (f)] 
(which is identical to the inverse matrix W'\f) if M = N) of the matrix, and 
stores the basis vectors A p (f) = [A lp (f), A Mp (f)] that constitute the 
Moore-Penrose generalized inverse matrix in memory area 104 (step S3). If 
5 M = N, W + (f) is identical to the inverse matrix W l (f). 

[0057] Then, the basis vector normalizing section 142 reads the basis 
vectors A p (f) (p = 1, N, f = 0, f s /L, f s (L - 1)/L) from memory area 104 of 
memory 100, normalizes them into normalized basis vectors A p f, (f), and stores 
them in memory area 106 of the memory 100 (step S4). It should be noted 

10 that the basis vector normalizing section 142 normalizes all basis vectors 
A p (f) (p = 1, N, f = 0, fg/L, f s (L - 1)/L) into normalized basis vectors 
A p "(f) that are not dependent on frequencies but only on the positions of the 
signal sources. Consequently, when they are clustered, each of the clusters 
will correspond to a signal source. If the normalization is not properly 

15 performed, clusters are not generated. Thg normalization in this 

embodiment consists of two steps: frequency normalization and norm 
normalization. The frequency normalization is performed by the frequency 
normalizing section 142a (Fig. 4B) to normalize basis vectors into 
frequency-normalized vectors that are independent of frequency. The norm 

20 normalization is performed by the norm normalizing section 142b to 

normalize the frequency-normalized vectors into normalized basis vectors 
whose norm has a predetermined value (1 in this example). These 
normalization operations will be detailed later. 

[0058] Then, the clustering section 143 reads the normalized basis vectors 
25 A p "(f) from memory area 106 of the memory 100, clusters the normalized 
basis vectors A p "(f) into N clusters C k (k = 1, N), and stores information 
identifying the clusters C k and their centroids (center vector) r| k in memory 
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areas 108 and 109 of the memory 100, respectively (step S5). The clustering 
is performed so that the total sum U of sums of squares Uk of the elements 
(normalized basis vectors A v "(f)) of each cluster C k and the centroid r| k of the 
cluster Ck 
5 [0059] [Formula 13] 

u = ZjL,u k 

U k = 2A v "(f)eC k | i ^v ff (f)~ r lk|| 

is minimized. The minimization can be performed effectively by using the 
k-means clustering described in Non-patent literature 6, for example. The 
centroid r\ k of each cluster C k can be calculated by 
10 [0060] [Formula 14] 

_ ^A v "(f)eC k Ay"(f )/ | C k | 



SA v "(f)eC k A v ?, (f V I C k 

where |C k | is the number of elements (normalized basis vectors A v "(f)) of the 
cluster C k . The distance used here is the square of the Euclidean distance, it 
may be the Minkowski distance, which is the generalized square of the 

15 Euclidean distance. The reason why the normalized basis vectors A p "(f) 
form clusters will be described later. 

Then, the permutation computing section 144 reads the normalized 
basis vectors A p "(f) from memory area 1 06 of the memory 1 00 and the 
centroids r| k of clusters C k from memory area 109. The permutation 

20 computing section 144 then uses them to calculate a permutation n f (a 
bijective mapping function from {1, 2, N} to {1, 2, N}) used for 
rearranging the elements of the first separation matrix W(f) for each 
frequency f and stores it in memory area 1 10 of the memory 100 (step S6). 
The permutation n f is determined by 
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[0061] [Formula 15] 



n f =argmin n Xk=i 



n(k) 



M (f)|i 



... (13) 



where "argminrr "represents IT that minimizes "•" and "A n (k)"(f)" represents 
the normalized basis vectors that are to be rearranged into normalized basis 
vectors A k "(f) by IX That is, n f causes the Il(k)-th normalized vector 
An(k)"(f) to be the normalized basis vector A k "(f) in the k-th column. The 
permutation n f can be determined according to Equation (13) by calculating 
[0062] [Formula 1 6] 



10 for all possible permutations II (N! permutations), for example, and by 

determining n corresponding to its minimum value as the permutation n f . 

An example of this procedure is given below. 

[Example 1 of determination of permutationQ f ] 

It is assumed here that the number N of signal sources is 3 and the 
15 squares of the distances between the normalized basis vectors Ai"(f), A 2 "(f) 5 

and A 3 "(f) at an frequency f and the centroids r\\, r\ 2 , and ti 3 are as shown in 

the following table. 

[0063] [Table 1] 





A,"(f) 


A 2 "(f) 


A 3 "(f) 


111 


0.85 


0.1 


0.7 




0.9 


0.6 


0.2 


T|3 


0.15 


0.8 


0.95 



20 Here, the permutation obtained according to Equation (13) is 

n f : [l,2,3]->[2,3, 1] 
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because the combinations 

lh,-An (1) "(f)|| 2 
|| 112 - A n(2) "(f) || 2 
||Ti3-An (3 )"(f)|| 2 

minimize 

[0064] [Formula 1 7] 



Ti r A 2 "(f)|| 2 = 0.1 
n 2 -A 3 ,, (f)|| 2 = 0.2 
r| 3 -A 1 "(f) || 2 = 0.15 



ZLi|hk-A n ( k )"(f)| 



(End of the description of Example 1 of determination of permutation n f ) 

However, this procedure will be unrealistic if N is large. 
10 Therefore, an approximation method may be used in which A n (k)"(f) that 

minimize || r| k - A n (k)"(f) || 2 are selected one by one in such a manner that 

there are no overlaps and a permutation that transfers the selected A n (k)"(f) to 

the normalized basis vector A k "(f) is chosen as the permutation n f . A 

procedure for determining the permutation ELf using this approximation 
15 method under the same conditions given in Example 1 of determination of 

permutation n f will be described below. 

[0065] [Example 2 of determination of permutation n f ] 

First, because the minimum square of distance in Table 1 is 0.1 (the 

square of the distance between the normalized basis vector A 2 "(f) and centroid 
20 r|i), n(l) = 2 is chosen. Then, the row and column relating to the 

normalized basis vector A 2 "(f) and centroid r|i are deleted as shown below. 

[0066] [Table 2] 





A,"(f) 


A 2 "(f) 


A 3 "(f) 


Til 








Tl2 


0.9 




0.2 


Tl3 


0.15 




0.95 
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Because the minimum square of distance in Table 2 is 0.15 (the 
square of the distance between the normalized basis vector Ai"(f) and centroid 
r| 3 ), FI(3) = 1 is chosen. Finally, the remainder, 3 is assigned to n(2). (End 
of the description of Example 2 of determination of permutation n f ) 
5 Then, the sorting section 145 reads the first separation matrix W(f) 

from memory area 103 of the memory 100 and the permutation n f from 
memory area 110. The sorting section 145 rearranges the rows of the first 
separation matrix W(f) in accordance with the permutation TIf to generate a 
second separation matrix W f (f) and stores it in memory 1 1 1 of the memory 
10 100 (step S7). The rearrangement of the first separation matrix W(f) 

according to the permutation n f means that rearrangement equivalent to the 
rearrangement of the elements A n (k) n (f) to the elements A k "(f) in the 
Moore-Penrose generalized inverse W*(f) described above is performed on 

« 

the first separation matrix W(f). That is, the first separation matrix W(f) is 
15 rearranged in such a manner that the Ilf(k)-th row of the first separation 

matrix W(f) becomes the k-th row of the second separation matrix W'(f). In 
the Examples 1 and 2 of determination of permutation n f , the second, third, 
and first rows of the first separation matrix W(f) become the first, second, and 
third rows, respectively, of the second separation matrix W f (f). 
20 [0067] [Processing by the separated signal generating section 150] 

Then, the separated signal generating section 150 reads the mixed 
signals X q (f, x) in the frequency domain from memory 102 of the memory 
1 00 and the second separation matrix W ! (f) from memory area 111. The 
separated signal generating section 150 then uses the mixed-signal vector X(f, 
25 x) = [X! (f, x), X M (f, x)] consisting of the mixed signals Xq(f, x) in the 
frequency domain and the second separation matrix W'(f) to calculate a 
separated signal vector 
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Y(f, t) = W(f>X(f, t) 
and stores the frequency-domain signals Y p (f,. t) which are the elements of the 
separated signal vector (which are referred to as "frequency-domain mixed 
signals) in memory area 1 12 of the memory 100 (step S8). 
5 [0068] [Processing by the time domain transforming section 160] 

Finally, the time domain transforming section 160 reads the 
frequency-domain separated signals Y p (f, t) from memory 1 12 of the memory 
100, transforms them into separated signals y p (t) in the time domain one by 
one for each suffix p (for each Y p (f, x)) by using transformation such as 
10 short-time inverse Fourier transformation, and stores the separated signals 
y p (t) in the time domain in memory area 1 13 of the memory 110 (step S9). 
[Details of normalization (details of step S4)] 

Details of the above-mentioned normalization (step S4) performed 
by the basis vector normalizing section 142 will be described below. 
15 [0069] Fig. 6 is a flowchart illustrating details of the normalization 
process. 

First, the control section 170 (Fig. 3) assign 1 to parameter p and 
stores it in the temporary memory 171 (step Sll). The control section 170 
also assigns 1 to parameter q and stores it in the temporary memory 171 (step 
20 SI 2). Then, the frequency normalizing section 142a (Fig. 4) reads the 

parameters d, c, and Q described above from memory area 107 of the memory 
100, reads parameters p and q from the temporary memory 171, and, for the 
elements A qp (f) of the basis vector A p (f), calculates 
[0070] [Formula 18] 

25 A qp '(f)HA qp (f)|exp 

then, stores the calculated Aqp^f) in memory area 1 05 of the memory 1 00 as 



J 



.arg[A qp (f)/A Qp (f)] 



4ftr ] d 



...(14) 
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the elements A qp f (f) of the frequency-normalized vector A p '(f) (step SI 3). 
Here, arg[-] represents the argument of • and j is the imaginary unit. 

In particular, the first normalizing section 142aa of the frequency 
normalizing section 142a first normalizes the argument of each element A qp (f) 
5 of a basis vector A p (f) on the basis of a particular element A Qp (f) of the basis 
vector A p (f) by 
[0071] [Formula 19] 

A qp ,,, (f ) =1 Aqp(f ) | exp{j ■ arg[Aqp(f )/ Ag p (f )]} ... (15) 

Then, the second normalizing section 142ab of the frequency 
10 normalizing section 142a divides the argument of each of the elements 

A qp m (f) normalized by the first normalizing section 142aa by a value 4fc _1 d 
proportional to the frequency f as 
[0072] [Formula 20] 

A^WHAqp-Wlexp 

15 Then, the control section 170 determines whether the value of 

parameter q stored in the temporary memory 171 satisfies q = M (step SI 4). 
If not q = M, the control section 170 sets a calculation result q + 1 as a new 
value of the parameter q, stores it in the temporary memory 171 (step SI 5), 
and returns to step SI 3. On the other hand, if q = M, then the control section 

20 1 70 determines whether p = N (step S 1 6). 

[0073] If not p = N, then the control section 170 sets a calculation result p 
+ 1 as a new value of the parameter p, stores it in the temporary memory 171 
(step S 1 7), and then returns to step S 12. On the other hand, if p = N, the 
control section 170 assigns 1 to the parameter p, and stores it in the temporary 

25 memory 171 (step SI 8). Then the norm normalizing section 142b starts 
processing. The norm normalizing section 142b first reads the elements 



. arg[A Q '"(f )] 



J 



qp 
4fc~ 1 d 



...(16) 
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A q p'(f) of the frequency-normalized vector A p '(f) from memory area 105 of the 
memory 1 00, calculates 
[0074] [Formula 21] 

|A p '(f )|| = VZSliCApq'CO) 2 ... (17) 

5 to obtain the norm || A p '(f) || of the frequency-normalized vector A p (f), and 
stores the frequency-normalized vector A p f (f) and its norm || A p (f) || in the 
temporary memory 171 (step SI 9). 

Then, the norm normalizing section 142b reads the 
frequency-normalized vector A p f (f) and its norm || A p f (f) || from the temporary 
10 memory 171, calculates 

A p " (f) = A p '(f)/ 1| A p '(f) || ...(18) 
to obtain a normalized basis vector A p "(f), and stores it in memory area 106 of 
the memory 100 (step S20). 

[0075] Then, the control section 170 determines whether the value of 
15 parameter p stored in the temporary memory 171 satisfies p = N (step S21). 
If not p = N, the control section 170 sets a calculation result p + 1 as a new 
value of the parameter p, stores it in the temporary memory 171 (step S22), 
and then returns to step S 19. On the other hand, if p = N, the control section 
170 terminates the processing at step S4. 
20 The normalized basis vectors A p "(f) thus generated are not 

dependent on frequency and dependent only on the positions of the signal 
sources. Consequently, the normalized basis vectors A p "(f) forms clusters. 
The reason will be described below. 

[0076] [Reason whey normalized basis vectors A p "(f) form clusters] 
25 Each of the elements Aq P (f) of a basis vector A p (f) is proportional to 

the frequency response Hq k from the signal source k corresponding to a source 
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signal p to a sensor q (that is, it is equal to the frequency response multiplied 
by a complex scalar). These complex scalars change with discrete time (i.e. 
with phase) whereas the relative value between the complex scalar 
corresponding to the source signal p and sensor q and the complex scalar 
5 corresponding to the source signal p and sensor Q does not change with 

changing discrete time (provided that the frequency f is the same). That is, if 
the frequency f is the same, the relative value between the argument of the 
complex scalar corresponding to the source signal p and sensor q and the 
argument of the complex scalar corresponding to the source signal p and 

10 sensor Q is constant. 

[0077] As described above, the first normalizing section 142aa of the 
frequency normalizing section 142a normalizes the argument of each element 
A qp (F) of a basis vector A p (f) on the basis of one particular element A Qp (f) of 
that basis vector A p (f). Thus, uncertainty due to the phase of the complex 

15 scalars mentioned above is eliminated and the argument of the element A^f) 
of the basis vector A p (f) corresponding to the source signal p and sensor q is 
represented as a value relative to the argument of the element A Qp (F) of the 
basis vector A p (f) corresponding to the source signal p and sensor Q 
(corresponding to the reference value Q). The relative value corresponding 

20 to the argument of the element A Qp (f) is represented as 0. The frequency 
response from a signal source k to a sensor q is approximated using a 
direct-wave model without reflections and reverberations. Then the 
argument normalized by the first normalizing section 142aa is proportional to 
both of the arrival time difference of waves from the signal source k to the 

25 sensor and the frequency f. The arrival time difference here is the difference 
between the time taken for a wave from the signal source k to reach the sensor 
q and the time taken for the wave to reach the reference sensor Q. 
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[0078] As has been describe above, the second normalizing section 142ab 
divides the argument of each element A qp m (f) normalized by the first 
normalizing section 142aa by a value proportional to the frequency f. Thus, 
the elements A qp m (f) are normalized to elements A qp ! (f) excluding dependence 
5 of their arguments on frequency. Consequently, according to the direct- wave 
model, each of the normalized elements Aqp f (f) depends only on the arrival 
time difference between the times at which the wave from the signal source k 
reaches the sensors. The arrival time difference of the wave from the signal 
source k to the sensors depends only on the relative positions of the signal 

10 source k, sensor q, and reference sensor Q. Accordingly, the arguments of 
the elements A qp ! (f) with the same signal source k, sensor q, and reference 
sensor Q are the same even if the frequency varies. Thus, the 
frequency-normalized vectors A p f (f) are not dependent on the frequency f but 
only on the positions of signal source k. 

15 [0079] Therefore, by clustering the normalized basis vectors A p M (f) 
resulting from normalization of the norms of the frequency-normalized 
vectors A p (f), clusters are generated, each of which corresponds to the same 
signal source. Although the direct-wave model is not exactly satisfied in a 
real environment because of reflections and reverberations, a sufficiently 

20 good approximation can be obtained as shown in experimental results which 
will be given later. 

The reason why the normalized basis vectors A p M (f) forms clusters 
will be described below with respect to a model. The impulse response 
h qk (r) in Equation (1) described earlier is approximated using a direct- wave 

25 (near-field) mixture model and represented in the frequency domain as 
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[0080] [Formula 22] 



H qk (f) = -^-exp[-j27rfc- 1 (d qk -d Qk )] ... (19) 

where d qk is the distance between a signal source k and a sensor q. The 
attenuation l/d qk is determined by the distance d qk and the delay (d qk - d Qk )/c is 
5 determined by the distance normalized at the position of the reference sensor 

Q. 

If order ambiguity and scaling ambiguity in independent component 
analysis (ICA) are taken into consideration, the following relation holds 
between the basis vector A p (f) and the vector H k (f) consisting of frequency 

10 responses from the signal source k to the sensors. 

[0081] A p (f) = e p H k (f), A qp (f) = Sp HqkCf) ... (20) 
where e p is a complex scalar value representing the ambiguity of the scaling. 
The possibility that suffixes k and p differ from each other represents the 
ambiguity of the order. From Equations (16), (18), (19), and (20), 

15 [0082] [Formula 23] 



A. "(0 = — ?— exp 
* <* qk D 



.71 (dqk ~ d Qk) 

J 2 d 



, D = JZ-i7I -(2D 

d ik 



As can be seen from this equation, the elements A qp M (f) of the normalized 
basis vector A p "(f) are independent of the frequency f and dependent only on 
the positions of the signal sources k and sensors q. Therefore, clustering the 

20 normalized basis vectors A p "(f) generates clusters, each corresponding to the 
same signal source. 

The same applies to a near- field mixture model in which signal 
attenuation is not taken into consideration. The convolutive mixture model 
represented by Equation (1) given earlier is approximated with a near- field 

25 mixture model in which attenuation is ignored arid represented in the 



-42- 

frequency domain as 
[0083] [Formula 24] 

H ak (f) = exp[-j27tfc _1 (d ak -d ok )] 



... (22) 



From Equations (16), (18), (20), and (22), it follows that 



5 [0084] [Formula 25] 



A qp "(f) = 



1 



VM 



exp 



• 7t ( d qk ~ d Qk) 

J 2 d 



... (23) 



Again, the elements A qp "(f) of the normalized basis vector A p "(f) are 
independent of the frequency f and dependent only on the positions of the 
signal source k and sensor q. 
10 Also, the same applies to a far-field mixture model as well as the 

near-field mixture model. The convolutive mixture model represented by 
Equation 1 mentioned above is approximated and represented in the 
frequency domain as 
[0085] [Formula 26] 



15 



H qk (f) = exp[-j27ifc- 1 SE q -SE Q 



cos0 k qQ ] 



... (24) 



Here, SE q and SEq are vectors representing the positions of sensors q and Q, 
and 8 k qQ is the angle between the straight line connecting sensors q and Q and 
the straight line connecting the center points of sensors q and Q and the signal 
source k. From Equations (16), (18), (20), and (24), 
20 [0086] [Formula 27] 



A qp ,, (f) = 



1 



Vm 



exp 



. 71 

J- 



SE q - SEq 



cos9 k qQ 



... (25) 



Again, the elements A^p'^f) of the normalized basis vector A p "(f) are 
independent of the frequency f and dependent only on the positions of the 



l max 
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signal source k and sensor q. . 

Preferably, the value of the parameter d is d > d max /2 (where d, 
represents the maximum distance between the reference sensor Q 
corresponding to element A Qp (f) and another sensor) from Equation (21), 

5 more preferably, d > d max , and more preferably, d = d max . The reason will be 
described below. 

[0087] Figs. 7 and 8 are complex planes for illustrating the relation 
between an element A qp "(f) of a normalized basis vector and its argument 
argfAqp^f)] at each value of parameter d. The horizontal axis in Figs. 7 and 

10 8 represents the real axis and the vertical axis represents the imaginary axis. 

Fig. 7A is a complex plane in the case where d max /2 > d. From the 
definition of d max given above, the absolute value of d qk - d Qk for any q and k 
is less than or equal to d max . Therefore, if d max /2 > d, then (7t/2)-(d qk - d Qk )/d 
< -7i and (7t/2)-(d qk - dQ k )/d > 7t. Consequently, the arguments arg[A qp ff (f)] of 

15 A qp ' f (f) represented by Equation (21) can be distributed over the range beyond 
2tc, oti < argfAqp^f)] < oc 2 (oil < -7i and a 2 > 7i). Accordingly, the arguments 
of elements Aqp^f) of different normalized basis vectors can match and 
consequently the different normalized basis vectors A p "(f) can be clustered 
into the same cluster in the clustering described above. Therefore, it is 

20 desirable that d > d max /2. However, if there is not a sample of the normalized 
basis vector A p "(f) that falls in an argument overlapping range, no problem 
arises even if d max /2 > d. 

[0088] Fig. 7B shows a complex plane in the case where d max /2 < d < d max . 
In this case, the relations -n < (7i/2)-(d qk - dQ k )/d < -7t/2 and 7i/2 < (7i/2)-(d qk - 
25 d Qk )/d < 7i can hold. Consequently, the arguments argfAqp^f)] of A^'^f) 
represented by Equation (21) can be distributed over the range (3j < 
argfAqp'^f)] < p 2 (-71 < Pi < -n/2 and 7i/2 < p 2 < ti)- Therefore, in the ranges 
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-7i < arg[A qp "(f)] < -tt/2 and n/2 < arg[A qp "(f)] < n, it is possible that the 
distance between different frequency-normalized vector elements does not 
monotonically increases with increasing difference between the arguments 
between the different frequency-normalized vector elements. This condition 
5 can degrade the accuracy of the clustering described above. Therefore, it 
desirable that d > d max . 

[0089] Fig. 8A is a complex plane in the case where d = d max . Fig. 8B is a 
complex plane in the case where d > d max . If d > d max , the relation 

-7c/2(7t/2>(d qk - d Qk )/d < 0 and 0 < (7t/2)-(d qk - d Qk )/d < n/2 can hold. 

10 Consequently, the arguments arg[Aqp M (f)] of A qp tf (f) represented by Equation 
(21) are distributed over the range yi < arg[A qp "(f)] - Y2 (-n/2 < Yi < 0 and 0 < 
y 2 < n/2) as shown in Fig. 8B. As d increases, the distribution range narrows 
and clusters are distributed more densely in the narrowed range. As a result, 
the accuracy of the clustering described above degrades. 

15 [0090] In contrast, when d = d max , the relations -n/2 < (7i/2)-(d qk -d Qk )/d <0 
and 0 < (7i/2)-(d qk - d Qk )/d < n/2 can hold. Consequently, the arguments 
arg[A qp "(f)] of A qp M (f) represented by Equation (21) are distributed over the 
range -n/2 < argfAqp^f)] < n/2 as shown in Fig. 8A. In this case the cluster 
can be distributed in a range as wide as possible while keeping the relation in 

20 which the distance between the frequency-normalized vector elements 

monotonically increases with increasing difference between the arguments of 
the elements. As a result, typically the accuracy of the clustering can be 
improved. 

[Second embodiment (example of the first aspect of the invention)] 
25 The second embodiment of the present invention will be described 

below. 

[0091] In the first embodiment, the permutation problem has been solved 
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by using information obtained from basis vectors. In the second 
embodiment, the permutation problem is solved more accurately by 
combining this information with information about envelopes of separated 
signals as described in Japanese Patent Application Laid-Open No. 
5 2004-145172 and H. Sawada, R. Mukai, S. Araki, S. Makino, "A Robust and 
Precise Method for Solving the Permutation Problem of Frequency-Domain 
Blind Source Separation/ 1 IEEE Trans. Speech and Audio processing, Vol. 12, 
No. 5, pp. 530 - 538, Sep. 2004 (hereinafter referred to as the "Reference 
literatures"). In these literatures, information about the directions of signal 
10 sources is used in stead of basis vectors. 

[0092] The following description focuses on differences from the first 
embodiment and description of the same elements as those in the first 
embodiment will be omitted. 

<Configuration> 

\_ 

15 Fig. 9 is a block diagram showing an example of a signal separating 

apparatus 200 according to the second embodiment. Like in the first 
embodiment, the signal separating apparatus 200 in the second embodiment is 
configured when a signal separating program is read into a CPU 10a (Fig. 2). 
Fig. 1 OA is a block diagram showing details of the permutation problem 

20 solving section 240 shown in Fig. 9 and Fig. 1 0B is a block diagram showing 
details of the permutation correcting section 247 shown in Fig. 10A. In Figs. 
9 and 10, the same elements as those in the first embodiment are labeled with 
the same reference numerals as those used in the first embodiment. The 
dashed arrows in Figs. 9 and 10 represent theoretical information flows 

25 whereas the solid arrows represent actual data flows. Arrows representing 
flows of data inputted into and outputted from a control section 170 are 
omitted from Figs. 9 and 10. Arrows representing actual data flows are also 
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omitted from Fig. 10. 

[0093] A major difference of the second embodiment from the first 
embodiment lies in the configuration of the permutation problem solving 

* 

section 240. The permutation problem solving section 240 in the second 
5 embodiment is the same as the permutation problem solving section 140 in 
the first embodiment, except that a permutation evaluating section 246 and a 
permutation correcting section 247 are added in the second embodiment (Figs 
9 and 10A). The permutation evaluating section 246 evaluates the reliability 
of a permutation on a frequency-by-frequency basis. If the reliability of a 

10 permutation at a frequency is evaluated as low, the permutation correcting 

section 247 calculates another permutation by using the envelope of separated 
signals. The permutation correcting section 247 includes a determining 
section 247a, a separated signal generating section 247b, an envelope 
computing section 247c, a permutation recomputing section 247d, and a 

15 resorting section 247e (Fig. 10B). In the second embodiment, the 

permutation computing section 1 44 and the permutation correcting section 
247 make up a "permutation computing section" as set fourth in claim 4. 
[0094] <Processing> 

Fig. 1 1 is a flowchart outlining a whole process performed in the 

20 signal separating apparatus 200 according to the second embodiment. The 
process performed in the signal separating apparatus 200 in the second 
embodiment will be described with reference to the flowchart. 

Steps S51 to S57 are the same as steps SI to S7 in the first 
embodiment and therefore the description thereof will be omitted. In the 

25 second embodiment, after step S57, the reliability of a permutation n f for 

each frequency is evaluated in the permutation evaluating section 246. For a 
frequency for which the reliability of the permutation n f is evaluated as low, 
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the envelope of separated signals is used to calculate another permutation ITf , 
rows of a second separation matrix W f (f), only for that frequency are 
rearranged in accordance with the permutation Ylf to generate a third 
separation matrix W"(f), and the third separation matrix W"(f) is stored in 
5 memory area 1 10 of a memory 100 (step S58). The processing will be 
detailed later. 

[0095] Then, a separated signal generating section 150 reads mixed signals 
X q (f, t) in the frequency domain from memory area 102 of the memory 100 
and the third separation matrix W"(f) from memory 111. The separated 

10 signal generating section 150 then uses a mixed-signal vector X(f, t) = [X^f, 
i), X M (f, t)] consisting of the frequency-domain mixed signals X q (f, t) 
and the third separation matrix W"(f) to compute a separated signal vector 

Y(f, t) = W"(f>X(f, t) 
and stores frequency-domain separated signals Y p (f, t) in memory area 1 12 of 

15 the memory 100 (step S59). 

[0096] Finally, the time domain transforming section 160 reads the 
frequency-domain separated signals Y p (f, x) from memory area 1 12 of the 
memory 100, transforms them into separated signals y p (t) in the time domain 
for each individual suffix p, and stores the time-domain separated signals y p (t) 

20 in memory area 1 13 of the memory 100 (step S60). 
[Details of processing at step S58] 

Figs. 12 and 13 show a flowchart illustrating an example of 
processing at step S58 in Fig. 1 1 . Step S58 will be detailed with reference to 
the flowchart. 

25 [0097] First, a control section 1 70 assigns 0 to parameter f, makes a set F 
an empty set, and stores information representing this in a temporary memory 
171 (step S71). Then, the permutation evaluating section 246 evaluates the 
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reliability of a permutation n f stored in memory area 1 10 of the memory 100 
for each frequency and stores the result of evaluation trust(f) in the temporary 
memory 171 (step S72). The reliability of a permutation n f is said to be 
high if the normalized basis vector A p "(f) is sufficiently close to its 
5 corresponding centroid r| k . Whether a normalized basis vector A p "(f) is 
sufficiently close to its corresponding centroid r| k can be determined on the 
basis of whether the distance between the normalized basis vector A p "(f) and 
the centroid r| k is smaller than the variance U k /|C k | of clusters C k : 

U k /|C k |> || n k -An( k )''(f) || 2 ...(26) 

10 At step S72, the permutation evaluating section 246 first reads the normalized 
basis vector A p "(f) from memory area 105 of the memory 100, the centroid r| k 
from memory area 109, and the permutation n f from memory area 110. The 
permutation evaluating section 246 then determines for each frequency f 
whether Equation 26 is satisfied. If it is satisfied, the permutation evaluating 

15 section 246 outputs and stores trust(f) = 1 in the temporary memory 171; 
otherwise it outputs and stores trust(f) = 0 in the temporary memory 171 . 
[0098] Then, the determining section 247a in the permutation correcting 
section 247 reads the evaluation result trust(f) for each frequency f from the 
temporary memory 171 and determines whether trust(f) = 1 (step S73). If 

20 trust(f) = 0, the process proceeds to step S76. On the other hand, if trust(f) = 
1, the control section 170 stores the sum of sets F and {f} in the temporary 
memory 171 as a new set F (step S74), and the re-sorting section 247e stores 
the second separation matrix W'(f) at the frequency f in memory area 1 1 1 of 
the memory 100 as a third separation matrix W"(f) (step S75), and then the 

25 proceeds to step S76. 

[0099] At step S76, the control section 170 determines whether the value 
of parameter f stored in the temporary memory 171 satisfies the condition f = 
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(L - l)f s /L (step S76). If it does not satisfy the condition, the control section 
1 70 stores a calculation result f+f s /L as a new value of parameter f in the 
temporary memory 171 (step S77), and then returns to step S72. 

On the other hand, if the value of parameter f satisfies the condition 
5 f = (L - l)f s /L, the separated signal generating section 247b selects one 
frequency f that does not belong to set F. For this frequency f and the 

frequencies g (where geF and |g - f| <5, and 8is a constant) that are in the 

vicinity of the frequency f and belong to set F, the separated signal generating 

section 247b reads mixed signals X(f, t) = [Xi(f, x), X M (f, t)] t and X(g, x) 
10 = [X)(g, x), X M (g, x)] in the frequency domain from memory area 102 of 

the memory 100, reads the second separation matrixes W ! (f) and W'(g) from 

memory area 111, and use 

Y(f, x) = W'(f>X(f, x) 
Y(g, x) - W(g>X(g, x) 
15 to compute separated signals Y(f, x) = [Yj(f, x), Y N (f, x)] T and Y(g, x) = 

[Yi(g, x), Y N (g, x)] , then stores them in the temporary memory 171 (step 

S78). 

[0100] Then, the envelope computing section 247c reads all the 
frequency-domain separated signal Y p (f, x) and Y p (g, x) from the temporary 
20 memory 171, calculates their envelopes 

v P f (x) = |Y p (f, x)| 

Vp g (T) = |Y p (g, T)| 

and stores them in the temporary memory 171 (step S79). 

Then, the permutation recomputing section 247d computes the 
25 maximum sum of correlations "cor 11 in the vicinity less than or equal to the 
difference 8 between the frequencies 
[0101] [Formula 28] 
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N 

R f =max n X Z cor ( v n(k) > v n 7^ S ) 

|g-f|<5k=l 1 w 

and stores it in the temporary memory (step S80). Here, n' is a 
predetermined permutation for frequency g. The correlation cor(0, V F) in 
the equation represents the correlation between two signals O and *F, defined 
5 as 

cor(cD, Vx F) = («D, *¥> - <0> < v F>)/(a 0 -a*) 
where <£> is the time average of <j<d is the standard deviation of <D, and 
vn(k) represents the envelope to be rearranged into envelope v k (x) by EL 

That is, the envelope v n (k) in the n(k)-th column becomes the k-th envelope 

f* 

10 v k (x) in accordance with IT 1 . 

[0102] The permutation recomputing section 247d calculates a 
permutation that maximizes the sum of the correlations cor as 
[0103] [Formula 29] 

N f 

rV=argmax n £ Zcor(v n(k) ,v . g ) 

|g-f|<5k=l w 

15 and stores it in memory area 110 of the memory 100 (step S81). Here, IT is 
a permutation predetermined for frequency g and argmax n v represents n that 
maximizes v. 

Then the control section 170 stores the sum of sets F and {Q 
(where ^ = argmaxfRf) in the temporary memory 171 as a new set F (step S82). 
20 Then, the re-sorting section 247e sets f = Q and rearranges the rows of the 

second separation matrix W'(f) in accordance with permutation flf to generate 
a third separation matrix W"(f), and stores it in memory area 1 1 1 of the 
memory 100 (step S83). 

[0104] The control section 170 then determines whether set F stored in the 
25 temporary memory 171 includes all discrete frequency elements f = 0, f s /L, 
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f s (L - 1) (step S84). If set F does not include all discrete frequency elements 
f = 0, f s /L, f s (L - 1)/L, the control section 170 returns to step S78. On the 
other hand, if set F includes all discrete frequency elements f = 0, f s /L, f s (L 
- 1)/L, the control section 170 ends processing at step S58. 
5 It should be noted that, instead of the method described above, any 

of other methods such as the method described in Japanese Patent Application 
Laid-Open No. 2004-145172 or "Reference literature" may be used to 
perform processing at step S58. 
[0105] <Experimental results> 

10 Results of experiments on sound source separation according to the 

first and second embodiments will be given below. 
[Results of first experiment] 

A first experiment is conducted using randomly arranged sensors. 
The experimental conditions are as shown in Fig. 14A. Four 

15 omnidirectional microphones arranged randomly were used. However, all 
information about the arrangement of the sensors provided to the signal 
separating apparatus was the maximum distance between the microphones, 
which was 4 cm. Three sound sources were used: English speeches are 
emitted through loudspeakers for three seconds. Fig. 14B shows the results 

20 in terms of SIR (signal-to-interference ratio). The larger the SIR, the better 
the separation performance. The results of experiments using four methods 
for solving the permutation problem were compared. Env indicates the 

result of a method using only information about the envelope |Y p (f, x)| of 
separated signals, Basis indicates the result of a method using clustering of 
25 normalized basis vectors A p !f (f) (the method according to the first 
embodiment), Basis + Env indicates the result of a method using the 
combination of these two items of information to solve the problem more 
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accurately (the method according to the second embodiment), and Optimal 
indicates the result of a method using an optimum permutation obtained with 
the knowledge of source signals s k and impulse responses h qk (r). 
[0106] Comparison of the results shows that the method using only Env 
5 provides varying separation performances whereas the method using Basis 
according to the first embodiment provides a sufficiently good separation 
performance. The results obtained using the combination of Basis and Env 
according to the second embodiment is almost as good as that of Optimal. 
Thus, a high performance of blind signal separation in the frequency domain 

10 was able to be achieved according to the present invention, even when the 
sensors were randomly arranged. 
[Results of second experiment] 

A second experiment is conducted using orderly arranged sensors. 
Fig. 15A shows the experimental conditions. Three omnidirectional 

15 microphones are linearly spaced 4 cm apart. As in the first experiment, three 
sound sources were used: English speeches were emitted through 
loudspeakers for three seconds. Fig. 15B shows the results. In this 
experiment, comparison was made among the results obtained using six 
methods, including the conventional-art method described earlier in which 

20 estimates of signal source positions are clustered. DOA represents the result 
of a method in which the permutation problem was solved by using only 
estimations of DOA (direction-of-arrival) and DOA + Env represents the 
result obtained by using combination of estimates of DOA and information 
about the envelope of separated signals. 

25 [0 1 07] Comparison of the results of the method using DOA and the 
method using DOA + Env, which are conventional-art methods, with the 
results of the methods using Basis and Basis + Env of the present invention 
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shows that the present invention generally provides improved performances in 
the orderly sensor arrangement to which the conventional approaches can be 
applied. It should be noted that computational cost was approximately 
equivalent to that in the prior-art methods. 
5 <Features of the first and second embodiments> 

Features of the first and second embodiments described above can 
be summarized as follows. 

(1) Because precise information about the positions of sensors is not 
needed but only information about the upper limit of the distance between one 
10 reference sensor and another sensor, random arrangement of sensors can be 
used and positional calibration is not required; and (2) because all information 
obtained from basis vectors is used to perform clustering, the permutation 
problem can be solved more accurately, thus improving the signal separation 
performance. 

15 [0108] The present invention is not limited to the embodiments described 
above. For example, while the Moore-Penrose generalized inverse matrix is 
used in the embodiments as the generalized matrix, any other generalized 
matrix may be used. 

The first normalizing section 142aa of the frequency normalizing 

20 section 142a normalizes the argument of each element Aqp(f) of a basis vector 
A p (f) on the basis of a particular element A Qp (f) of the basis vector A p (f) 
according to Equation (15) in the first embodiment. However, the first 
normalizing section 142aa may normalize the argument of each element 
Aqp(f) of a basis vector A p (f) on the basis of a particular element A Qp (f) of the 

25 basis vector A p (f) in accordance with the following equations: 
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[0109] [Formula 30] 



Aqp m (f ) =1 Aqp(f ) | exp{j • (argfAqpCf ) • A Qp *(f )])} ... (27-1) 

A qp ,,, (f ) =1 A qp (f ) | exp{j • (aigfAqpCf )] - arg[A Qp (f )])} ... (27-2) 
Aqp m (fH A qp (f)|exp{j-^(arg[A qp (f)/A Qp (f)])} ... (27-3) 

5 Here, "•*" is a complex conjugate and ,,V F{-}" is a function, preferably a 

monotonically increasing function, from the viewpoint of improving the 

precision of clustering. 

The frequency normalizing section 1 42a may use the following 

equations 
10 [0110] [Formula 31] 

arg[A Qn (f)/A 0n (f)] 

Aqp'(f) = p- , - zn Qp - ( 28 -!) 

4fc d 

A,-(f)-p- ^«g^^ ...(28-2) 

4fc d 

A,-(f)-p- " riA * (f)1 -" riA qP (f)1 ...(28-3) 

4fc d 

A ^ (f)=p .w^^ ... (28 . 4) 

4fc d 

15 instead of Equation (14) to perform frequency normalization. Here, p is a 
constant (for example p = 1). 

While the norm normalizing section 142b in the above-described 
embodiments performs normalization so that the norm becomes equal to 1 , it 
may perform normalization so that the norm becomes equal to a 

20 predetermined number other than 1 . Furthermore, the norm normalizing 
section 142b may be not provided and therefore norm normalization may be 
omitted. In that case, the clustering section 143 performs clustering of 
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frequency-normalized vectors Ap(f). However, the norms of 
frequency-normalized vectors A p ! (f) are not equal. Accordingly, the 
clustering reference in this case is whether vectors are similar to each other 
only in direction, rather than both in direction and norm. This means 
5 evaluation using the degree of similarity. One example of the measure of 
similarity may be cosine distance 

cose = |A p ,H (f)-T lk |/(||A p '(f)||-|h k ||) 
where 0 is the angle between a frequency-normalized vector A p '(f) and the 
vector of the centroid r| k . If cosine distances are used, the clustering section 
10 143 generates a cluster that minimizes the total sum of the cosine distances 
[0111] [Formula 32] 



Ui = ZApW.Cs K' H (f ) ' ^i|4v(f )|| • hi) 



Here, the centroid r| k is the average among the members of each cluster. 

In the second embodiment, the reliability of a permutation for each 

15 frequency is evaluated and, for a frequency for which the reliability of the 
permutation is evaluated as low, the envelope of separated signals is used to 
calculate a new permutation. However, a permutation for all frequencies 
may be generated by using the envelope of separated signals, the center 
vectors of clusters, and normalized basis vectors. 

20 [0112] Furthermore, the envelope of separated signals are first used to 

compute a permutation, the reliability of the permutation is evaluated for each 
individual frequency, and then the method of the first embodiment is applied 
to a frequency evaluated as having a low reliability permutation to calculate a 
new permutation for the frequency. 

25 While the second separation matrix W ! (f) is used to compute the 

envelope of separated signals in the second embodiment, the first separation 
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matrix W(f) or a matrix resulting from rearrangement of the rows of the first 
matrix W(f) may be used to compute the envelope. 

The same value of parameter d may be used for all sensors q or 
different values may be set for different sensors q. For example, the distance 
5 between the reference sensor and each sensor q may be set as the value of 
parameter d for the sensor q. 

[0113] [Third embodiment (example of the second aspect of the present 
invention)] 

The third embodiment of the present invention will be described 

10 below. 

The third embodiment uses the principles described above to extract 
a target signal from mixed signals in which signals originated from multiple 
sources are mixed, without having information about the direction of the 
target signal. 

15 <Configuration> 

Like the signal separating apparatus in the first embodiment, a 
signal separating apparatus of the present embodiment is configured by 
loading a signal separating program into a computer of well-known von 
Neumann- type. Fig. 1 6 is a block diagram illustrating a configuration of a 

20 signal separating apparatus 1001 according to the third embodiment. 

[0114] As shown in Fig. 16, the signal separating apparatus 1001 has a 
memory 1100 including memory areas 1101 - 1114, a frequency domain 
transforming section 1120, a signal separating section 1 130, a target signal 
selecting section 1 140, a time-frequency masking section 1 150 (which is 

25 equivalent to the "separated signal generating section 11 ), a time domain 

transforming section 1 1 60, a control section 1 1 70, and a temporary memory 
1 1 80. The memory 1 1 00 and the temporary memory 1 1 80 may be, but not 
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limited to, at least one of a register lOac, a data area lOfb of an auxiliary 
storage device lOf, and a data area lOdb of a RAM lOd (Fig. 2). The 
frequency domain transforming section 1120, the signal separating section 
1130, the target signal selecting section 1 140, the time frequency masking 
5 section 1150, the time domain transforming section! 160, the control section 
1 170, and the temporary memory 1 180 are configured by an OS program and 
a signal separating program read into a CPU 10a (Fig. 2), for example. 
[0115] Fig. 17A is a block diagram illustrating a detailed configuration of 
the target signal selecting section 1 140 shown in Fig. 16. Fig. 17B is a block 

10 diagram illustrating a detailed configuration of the basis vector clustering 
section 1142 in Fig. 17A. 

As shown in Figs. 17A and 17B, the target signal selecting section 
1140 includes an inverse matrix computing section 1141 (which is equivalent 
to the "complex vector generating section"), a basis vector clustering section 

15 1 142, and a selecting section 1 143. The basis vector clustering section 1 142 
includes a frequency normalizing section 1 142a (which constitutes the 
"normalizing section"), a norm normalizing section 1142b (which constitutes 
the "normalizing section"), a clustering section 1142c, and a variance 
determining section 1142d. The frequency normalizing section 1 142a 

20 includes a first normalizing section 1 142aa and a second normalizing section 
1142ab. 

[0116] Fig. 18A is a block diagram illustrating a detailed configuration of 
the time-frequency masking section 1150 shown in Fig. 16. Fig. 18B is a 
block diagram showing a detailed configuration of the mask generating 
25 section 1151 shown in Fig. 18 A. 

As shown in Figs. 18A and 18B, the time-frequency masking 
section 1150 includes a mask generating section 1151 and a masking section 
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1 1 52. The mask generating section 1151 includes a whitening matrix 
generating section 1151a, a whitening section 1151b, an angle computing 
section 1151c, and a function operation section 115 Id. 
[0117] The solid arrows in Figs. 16 to 18 represents actual data flows and 
5 the dashed arrows represent theoretical information flows. Flows of data 
inputted to and outputted from the control section 1170 and the temporary 
memory 1 180 are not depicted. The signal separating apparatus 1001 
performs processes under the control of the control section 1 170. Unless 
otherwise stated, the control section 1170 performs processing while reading 
10 and writing required data in the temporary memory 1 180. 
<Processing> 

Processing performed in the signal separating apparatus 1001 
according to the third embodiment will be described below. 
[0118] The assumption is that N signal sources k(ke {1,2, N}) exist in 

15 a space and their signals s k (t) (where "t" is sampling time) are mixed and are 
observed at M sensors q(qe {1,2, M}) as mixed signals Xq. In the third 
embodiment, a target signal originating from any of the signal sources is 
extracted only from mixed signals Xi(t), x M (t) and other interfering signals 
are suppressed to obtain a signal y(t). The number N of signal sources may 

20 be greater or less than or equal to the number M of sensors. Information 

about the number N of signal sources does not need to be obtained beforehand. 
The processing may be performed in a situation where signal sources cannot 
be counted. 

[0119] [Outline of processing] 
25 Fig. 19 is a flowchart outlining a whole signal separating process 

according to the third embodiment. The outline of the signal separating 
process in the third embodiment will be described with reference to Fig. 19. 
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First, mixed signals x q (t) (q e { 1, M}) in the time domain 
observed by M sensors are stored in memory area 1101 in the memory 1 1 00 
during preprocessing. Once the signal separation is started, the frequency 
domain transforming section 1120 reads the time-domain mixed signals Xq(t) 
5 from memory area 1101 of the memory 1 100. The frequency domain 

transforming section 1120 then transforms them into the frequency-domain 
mixed signals X q (f, x) by using such as a short-time Fourier transformation, 
and stores the frequency-domain mixed signals X q (f, x) in memory area 1 1 02 
of the memory 1100 (step S101). 

10 [0120] Then, the signal separating section 1 130 reads the 

frequency-domain mixed signals X q (f, x) from memory area 1 102 of the 
memory 1100. The signal separating section 1130 in this example applies 
independent component analysis (ICA) to a mixed-signal vector X(f, x) = 
[Xj(f, x), X M (f, x)] consisting of the read mixed signals X q (f, x) to 

15 calculate, for each individual frequency f, a separation matrix W(f) = 

[W,(f), W M (f)] H ofM rows and M columns (where "* H " is a complex 
conjugate transposed matrix of a matrix *) and a separated signal vector 

Y(f, x) = W(f>X(f, x) ...(30) 
(step SI 02). The calculated separation matrix W(f) is stored in memory area 
20 1 103 of the memory 1 100. The separated signals Y p (f, x) (p e {1, M}) 
constituting the separated signal vector Y(f, x) = [Yi(f, x), Y M (f> x)] T are 
stored in memory area 1 107. The processing at step SI 02 will be detailed 
later. 

[0121] Then, the target signal selecting section 1 140 reads the separation 
25 matrix W(f) from memory area 1 103 of the memory 1 100, normalizes basis 
vectors which are columns of the generalized inverse matrix of the separation 
matrix W(f), and clusters the normalized basis vectors. The target signal 
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selecting section 1140 selects, for each frequency f, selection signals Yi ( f)(f, x) 
including the target signal and basis vectors Ai ( f)(f) corresponding to them 
from the separated signals in memory area 1 107 of the memory 1 100 on the 
basis of the variances of the clusters and stores them in memory area 1111 of 
5 the memory 1 100 (step S 103). In the third embodiment, a signal from a 

source which is near a sensor and therefore its power observed at the sensor is 
dominating over signals from the other sources and is useful as information is 
selected as the target signal. The processing at step SI 03 will be detailed 
later. 

10 [0122] Then, the time-frequency masking section 1 150 reads the 

frequency- domain mixed signals X q (f, x) from memory area 1 102 of the 
memory 1 100, reads the basis vectors A I( f)(f) corresponding to the selection 
signals Y 1( f)(f, x) from memory area 1 104, uses them to generate a 
time-frequency mask M(f, x), and stores it in memory area 1112 (step SI 04). 

15 The processing at step SI 04 (processing by the time-frequency masking 
section 1150) will be detailed later. 

Then, time-frequency masking section 1150 reads the selection 
signals Y I( f ) (f, x) selected by the target signal selecting section 1140 from 
memory area 1 107 of the memory 1 100 and the time-frequency mask M(f, x) 

20 from memory area 1112. The time-frequency masking section 1 150 then 
applies the time frequency mask M(f, x) to the selection signals Y I(f )(f, x) to 
further suppress interfering signal components remaining in the selection 
signals Yi (f )(f, x) to generate masked selection signals Yj^f, x), and stores 
them in memory area 1113 of the memory 1 100 (step SI 05). The processing 

25 at step SI 05 (processing by time-frequency masking section 1 150) will be 
detailed later. 

[0123] Finally, the time domain transforming section 1 160 reads the 
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selected separated signals Yi (f ) f (f, x) in the frequency domain from memory 
area 1113 of the memory 1 100, applies a transformation such as a short-time 
inverse Fourier transformation to them to generate separated signals y(t) in 
the time domain, and stores them in memory area 1 1 14 of the memory 1 100 
5 (step SI 06). 

[Details of processing at step SI 02 (processing by the signal separating 
section 1130)] 

As mentioned above, the signal separating section 1130 in this 
example uses independent component analysis (ICA) to compute separation 

10 matrices W(f) = [W,(f), W M (f)] H consisting of M rows and M columns and 
separated signal vectors Y(f, x) = [Y^f, x), Y M (f, x)] T from the 
mixed-signal vectors X(f, x) = [Xj(f, x), X M (f, x)] T (step SI 02). 
Independent component analysis (ICA) is a method for computing a 
separation matrix W(f) such that the elements of a separated signal vector Y(f, 

15 x) = [Yi(f ? x), Y M (f, x)] are then independent of one another. Various 
algorithms have been proposed, including the one described in Non-patent 
literature 4. Independent component analysis (ICA) can separate and extract 
more advantageously target signals of the third embodiment which are more 
powerful and more non-Gaussian than interfering signals, which are less 

20 powerful and more Gaussian. 

[0124] [Details of processing at step SI 03 (processing by the target signal 
selecting section 1140)] 

Independent component analysis (ICA) exploits independence of 
signals to separate the signals. Therefore the separated signals Y p (f, x) have 

25 ambiguity of the order. This is because the independence is retained even if 
the order is changed. Therefore, a separated signal corresponding to a target 
signal must be selected at each frequency. The target signal selecting section 
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1140 performs this selection through the following process. 

Fig. 20 is a flowchart illustrating details of processing by the target 
signal selecting section 1 140 in the third embodiment. With reference to Fig. 
20, processing by the target signal selecting section 1 140 will be detailed 
5 below. 

[0125] First, the inverse matrix computing section 1141 reads, for each 
frequency, a separation matrix W(f) consisting of M rows and M columns 
from memory area 1 103 of the memory 1 100 and computes its inverse matrix 

W(f)- 1 = [A,(f), ...,A M (f)](where the rows are A p (f) = [A lp (f), 
10 A Mp (f)] T ) ...(31) 

Here, the both sides of Equation (30) are multiplied by Equation 
(3 1) to obtain the decompositions of the frequency-domain mixed signals X(f, 
t) as 

[0126] [Formula 33] 
15 X(f ,t) = Zjli A p (f )Y p (f ,x) ... (32) 

Here, A p (f) denotes basis vectors, each of which corresponds to a separated 
signal Y p (f, x) at each frequency. The basis vectors A p (f) calculated 
according to Equation (31) are stored in memory area 1 104 of the memory 
1100 (step Sill). 

20 Then, the basis vector clustering section 1 142 normalizes all basis 

vectors A p (f) (p = 1, M and f = 0, F s /L, f s (L - L)/ L). The 
normalization is performed so that the normalized basis vectors A p (f) form 
clusters that are dependent only on the positions of multiple signal sources 
when the convolutive mixture of signals originated from the multiple sources 

25 are approximated as a given model (for example a near-field model). In this 
example, frequency normalization and norm normalization similar to those 
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used in the first embodiment are performed. 

[0127] The frequency normalization is performed by the frequency 
normalizing section 1 142a of the basis vector clustering section 1 142 (Fig. 
17B). In particular, the frequency normalizing section 1 142a reads basis 
5 vectors A p (f) (p = 1, M and f = 0, f s /L, F S (L - L)/L) from memory area 
1 104 of the memory 1 100 and normalizes them to frequency-normalized 
vectors A p (f) that are independent of frequency and stores them in memory 
area 1 105 of the memory 1 100 (step SI 12). The normalization is performed 
for each element A qp (f) of the basis vector A p (f) (The normalization will be 

10 detailed later). The norm normalization is performed by the norm 

normalizing section 1142b of the basis vector clustering section 1142 (Fig. 
17B). In particular, the norm normalizing section 1 142b reads 
frequency-normalized vectors A p (f) from memory area 1 105 of the memory 
1 100, normalizes them to normalized basis vectors A p "(f) whose norm has a 

15 predetermined value (1 in this example), and stores them in memory area 

1 106 of the memory 1 100 (step SI 13). The normalization is performed for 
each frequency-normalized vector A p ! (f) (The normalization will be detailed 
later). 

[0128] After the completion of the normalization of the basis vectors, the 
20 clustering section 1142c (Fig. 17B) identifies M clusters Q (ie {1, M}) 
formed by the normalized basis vectors A p tf (f). In this example, the 
clustering section 1 142c reads the normalized basis vectors A p "(f) from 
memory area 1 106 of the memory 1 100, clusters them into M clusters Q (i = 
1, M), and stores information identifying each of the clusters Q (for 
25 example information indicating normalized basis vectors A p "(f) that belongs 
to the cluster) and the centroid (center vector) of the cluster Q in memory 
areas 1 109 and 1110, respectively, of the memory 1 100 (step SI 14). The 
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clustering is performed so that the total sum U of sums of squares Uj of the 
elements (normalized basis vectors A v "(f)) of each cluster Q and the centroid 
r|i of the cluster Q 
[0129] [Formula 34] 

2 ... (33) 

Ui=ZA v "(f)eCil A v H ( f )-r| i || 

is minimized. The minimization can be effectively performed by using the 
k-means clustering described in Non-patent literature 6, for example. The 
centroid r|i of a cluster Q can be calculated as 
[0130] [Formula 35] 

_ ZA v "(f)€Cj A v M ( f ) / I C i I 



10 xi,= 



SA v "(f)eCi A v M (f )/ I Q | 



... (34) 



where |Q| is the number of elements (normalized basis vectors A v "(f)) of a 
cluster Ci and || * || is the norm of a vector "*". While the square of the 
Euclidean distance is used as the distance, it may be its generalized distance, 
such as the Minkowski distance. 

15 Once M clusters Q are obtained, the variance determining section 

1 142d (Fig. 17B) selects a cluster that corresponds to the target signal and 
stores selection information 1(f) indicating the selected cluster in memory area 
1111 of the memory 1100 (step S 1 1 5). In the third embodiment, the variance 
Uj/|Q| of clusters is used as an indicator to select separated signals including 

20 the target signal. That is, the normalization of basis vectors in the third 
embodiment is performed in such a manner that, when the convolutive 
mixture of signals originated from multiple sources are approximated as a 
predetermined model, the normalized basis vectors are dependent only on the 
positions of the sources. However, there are various factors in a real 
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environment that are not reflected in such a model. The discrepancy 
between a real environment and a model increases as the distance from a 
signal source to a sensor increases. For example, wave components 
reflected by objects such as walls are not taken into consideration in a 
5 near-field model and the ratio of the reflected wave components to a direct 
wave component increases as the distance between the signal source and a 
sensor increases. Therefore, a model becomes improper as the signal goes 
away from the sensor. Consequently, signals closer to a sensor can be 
normalized under conditions closer to the real environment and therefore the 

10 variance of clusters caused by a discrepancy between the real environment 
and the model can be reduced. In the third embodiment, a signal near a 
sensor is selected as the target signal. Therefore, a cluster that has a smaller 
variance is simply selected as the cluster corresponding to the target signal. 
The selection procedure (step SI 15) will be detailed later. 

15 [0131] After the selection information 1(f) for each frequency f is 
computed, a selection signal Y I( f)(f, t) at each frequency f and its 
corresponding basis vector Ai (f ) (f) are selected. In particular, the selecting 
section 1 143 first reads the selection information 1(f) from memory area 1111 
of the memory 1 100. The selecting section 1 143 then reads a separated 

20 signal corresponding to the selection information 1(f) from memory area 1 107 
as the selection signal Yi (f )(f, t), reads its corresponding basis vector A 1( f) (f) 
from memory area 1 104, and stores them in memory area 1111 (step SI 16). 
[0132] The normalizations at step S 1 1 2 and S 1 1 3 (Fig. 20) will be detailed 
below. 

25 [Details of step SI 12 (frequency normalization)] 

Fig. 21 A is a flowchart illustrating details of the frequency 
normalization performed at step SI 12. 
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First, the control section 1 170 (Fig. 16) assigns 1 to parameter p and 
stores it in the temporary memory 1 180 (step S121). The control section 
1170 also assigns 1 to parameter q and stores it in the temporary memory 
1180 (step SI 22). Then, the frequency normalizing section 1142a (Fig. 17B) 
5 reads the parameters d, c, and Q described above from memory area 1 108 of 
the memory 1 100, reads the elements Aqp(f) of the basis vector A p (f) from 
memory area 1 104, and reads the parameters p and q from the temporary 
memory 1 180. The frequency normalizing section 1 142a then performs on 
the elements A qp (f) of the basis vector A p (f) the following calculation 
10 [0133] [Formula 36] 



A^f) =| A qp (f)|exp 



J 



.arg[A qp (f)/A Qp (f)] 



4fc _1 d 



... (35) 

and stores the results A qp f f(f) in memory area 1 105 of the memory 1 100 as the 

elements A qp '(f) of a frequency-normalized vector A p (f) (step SI 23). Here, 

arg[-] represents an argument, exp is Napier's number, and j is an imaginary 
15 unit. In particular, the normalization is performed according to Equations 

(15) and (16) given earlier. 

Then, the control section 1 170 determines whether the value of 

parameter q stored in the temporary memory 1 180 satisfies q = M (step SI 24). 

If not q = M, the control section 1 170 sets a calculation result q + 1 as a new 
20 value of parameter q, stores it in the temporary memory 1 1 80 (step SI 25), and 

then returns to step SI 23. On the other hand, if q = M, the control section 

1170 further determines whether p = M (step SI 26). 

[0134] If not p = M, the control section 1 170 sets a calculation result p + 1 
as a new value of parameter p, stores it in the temporary memory 1 180 (step 
25 SI 27), and then returns to step S122. On the other hand, if p = M, the 

control section 1 170 terminates processing at step SI 12. (End of the detailed 
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description of step S 1 1 2 (frequency normalization)) 
[Details of step SI 13 (norm normalization)] 

Fig. 2 IB is a flowchart illustrating details of the norm normalization 
performed at step S 1 1 3 . 
5 [0135] First, the control section 1170 assigns 1 to parameter p and stores it 
in the temporary memory 1 1 80 (step S 1 3 1). Then, the norm normalizing 
section reads the elements A qp f (f) of the frequency-normalized vector A p '(f) 
from memory area 1 105 of the memory 1 100 5 calculates 
[0136] [Formula 37] 



10 



A p Xf)I = V^?-i (A pq ,(f))2 - (38) 

to obtain the norm || A p (f) || of the frequency-normalized vector A p (f), and 
stores the frequency-normalized vector A p f (f) and its norm || A p (f) || in the 
temporary memory 1180 (step SI 32). 

Then, the norm normalizing section 1 142b reads the 
15 frequency-normalized vector A p ! (f) and its norm || A p '(f) || from the 

« 

temporary memory 1180, calculates 

A p " (f) = A p '(f)/ 1| Ap'(f) || -.(39) 
and stores the calculated normalized basis vector A p "(f) in memory area 1106 
of the memory (step SI 33). Then, the control section 1 170 determines 

20 whether the value of parameter p stored in the temporary memory 1180 
satisfies p = M (step S 134). If not p = M, the control section 1 1 70 sets a 
calculation result p + 1 as a new value of parameter p, stores it in the 
temporary memory 1 180 (step SI 35), and then returns to step SI 32. On the 
other hand, if p = M, the control section 1 170 terminates processing at step 

25 SI 13. The reason why the normalized basis vectors A p !t (f) form clusters has 
been described with respect to the first embodiment. (End of the detailed 
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description of step SI 13 (norm normalization)) 

[0137] The normalized basis vectors A p "(f) thus generated are independent 
of frequency and are dependent only on the positions of signal sources as 
described in the first embodiment. 
5 [Details of procedure for selecting selection signals (step SI 15)] 

Details of the procedure for selecting selection signals (step SI 15) 
mentioned above will be illustrated below. 
Cluster selection procedure 1 

A first example selects the cluster that has the smallest variance as 
10 the cluster corresponding to a target signal. Fig. 22 is a flowchart illustrating 
the first example. 

[0138] First, the variance determining section 1 142d (Fig. 17B) reads 
information identifying clusters Q (i g {1, M}) from memory area 1109 of 
the memory 1 100 and also reads normalized basis vectors A p "(f) e Q and 

» 

15 their centroids r|i from memory areas 1 106 and 1110, respectively. The 

variance determining section 1142d then calculates Uj for each "i ,f according 
to Equation (33), counts the elements (normalized basis vectors A v "(f)) that 
belong to Q to obtain |Q|, calculates the variance of cluster Q, Ui/|Q|, and 
stores it in the temporary memory 1 180. Then, the variance determining 

20 section 1 142d selects the smallest one of the variances Ui/|Q| stored in the 
temporary memory 1180 and stores information indicating the cluster 
corresponding to the smallest variance in the temporary memory 1 1 80 as 
cluster selection information 

i = argminj Ui/|Ci| ... (40) 

25 (step S141). In Equation (40), argminj * represents i that minimizes the 
value of "*". 

[0139] Then, the control section 1 170 (Fig. 16) assigns 0 to parameter f 
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and stores it in the temporary memory 1 180 (step SI 42). 

Then, the variance determining section 1 142d reads the cluster 
selection information i from the temporary memory 1 1 80 and reads the 
centroid r\ v that corresponds to the cluster selection information i from 
5 memory area 1110 of the memory 1 100. The variance determining section 
1142d also reads the normalized basis vectors A p "(f) {p e {1, M}} from 
memory area 1 1 06 of the memory 1 100. The variance determining section 
1142d then calculates, for each frequency f, selection information 

I(f) = argmin p ||A p "(f)-r| l || 2 ... (41) 

10 and stores it in memory area 1111 (step S 143). 

[0140] Then, the control section 1 1 170 reads parameter f from the 
temporary memory 1 180 and determines whether f = (L - l)f s /L (step SI 44). 
If not f = (L - l)-f s /L, the control section 1 170 adds f s /L to the value of 
parameter f, stores the result in the temporary memory 1 180 as a new value of 

15 parameter f (step SMS), and then returns to step SI 43. On the other hand, if 
f = (L -l)-f s /L, the control section 1170 terminates step SI 15. 
Cluster selection procedure 2 

A second example selects clusters that have variances smaller than a 
predetermined threshold value as the clusters corresponding to a target signal. 

20 The threshold value is for example an empirically determined value or a value 
based on experimental results and is stored in the memory 1 100 beforehand. 
[0141] The variance determining section 1142d sorts the variances Ui/|Ci| 
of clusters in ascending or descending order by using any of well-known 
sorting algorithms, instead of performing step S141 (Fig. 22). The variance 

25 determining section 1 142d then reads the threshold value stored in the 
memory 1 100, selects clusters that have variances Ui/|Cj| smaller than the 
threshold value, and stores the set of suffixes i that correspond to the selected 
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clusters in the temporary memory 1180 as cluster selection information i. 
The rest of the procedure is the same as cluster selection procedure 1 . 
Cluster selection procedure 3 

A third example selects not only the cluster that has the smallest 
5 variance but also a predetermined number of clusters in ascending order of 
variance Uj|Cj| (for example, three clusters in ascending order of variance) as 
clusters corresponding to a target cluster. 

[0142] The variance determining section 1 142d sorts the variances W|Ci| 
of clusters in ascending or descending order using any of well-known sorting 

10 algorithms, instead of performing processing at step S141 (Fig. 22). The 
variance determining section 1 142d then selects a predetermined number of 
clusters in ascending order of variance Ui/|Cj|. Then, the variance 
determining section 1 1 42d stores the set of suffixes i corresponding to the 
selected clusters in the temporary memory 1 1 80 as cluster selection 

15 information i. The rest of the procedure is the same as cluster selection 
procedure 1. 

In stead of cluster selection procedure 1 , a procedure which 
selects any of the clusters that have the second smallest variance or larger 
may be used, or a combination of parts of the cluster selection procedures 

20 described above may be used. (End of the description of Step S 1 1 5 and of 
details of step SI 03 (processing by the target signal selecting section 1140) 
[0143] [Details of processing by the time-frequency masking section 1 150 
(steps SI 04 and SI 05)] 

Processing by the time-frequency masking section 1150 will be 

25 described below. As mentioned earlier, the time- frequency masking section 
1150 suppresses interfering signal components remaining in selection signals 
Yi (f )(f, x) selected by the target signal selecting section 1 140. The reason 
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why interfering signals remain in the selection signals Yi (f) (f, t) will be 
described first. 

Focusing only on selection signals, equation (30) given above can 
be rewritten as 

Y I(f) (f, x) = W I(f) H (f>X(f, x) ... (42) 

If Equation (4) is substituted in Equation (42) and frequency f is 
omitted, the equation can be rewritten as. 
[0144] [Formula 38] 

Y l (i) = W 1 li -Hj S I (x)+ ZW, H H k -S k (x) ... (43) 

k=l,...,I-l,I+l,...,N 

If N < M, W, that satisfies W, H H k = 0, v k e {1, 1-1, 1+1, N} can be set 
by using independent component analysis (ICA). Then, the second term in 
Equation (43) will be 0. However, if the number N of signal sources is 
greater than the number M of sensors, which is a more common situation, 
there isKc{l 5 1-1, 1+1, N} that results in W, H H k * 0, v k e k. In this 
case, selection signals Yi(f) include unnecessary residual components 
(residual components of interfering signals) 
[0145] [Formula 39] 

ZW, H .H k .S k (T) 

keK 

(hereinafter f is not omitted). 

The purpose of using the time-frequency masking section 1 150 is to 
suppress such unnecessary residual components included in selection signals 
Y^f, t), thereby generating masked selection signals Y^f, x) including less 
residual interfering signal components. For this purpose, the mask 
generating section 1151 (Fig. 1 8) of the time-frequency masking section 1150 
generates a time-frequency mask 0 < M(f, t) < 1 that takes on a smaller value 
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for a time- frequency slot containing more residual interfering signal 
components and a greater value for a time-frequency slot containing less 
residual interfering signal components. Then, the masking section 1 152 
performs masking in accordance with 
5 Y I(f) '(f, x) = M(f, x>Y I(f) (f, t) ... (44) 

and outputs masked selection signals Y I( f) ! (f, x). The mask generation will be 
detailed below. 

[0146] [Details of step SI 04 (processing by mask generating section 1151)] 

Fig. 23 is a flowchart illustrating details of step SI 04 in Fig. 19. 
10 With reference to the flowchart, step SI 04 (processing by the mask generating 

section 1151) will be detailed below. 

The mask generating section 1151 in this example obtains the angle 

6i(f)(f, x) between a mixed-signal vector X(f, x) and a basis vector A I(f )(f) 

corresponding to a selection signal in a space in which the frequency-domain 
15 mixed-signal vector X(f, x) is whitened (a whitening space), and generates a 

time-frequency mask based on the angle 6i (f) (f, x). Whitening transforms a 

mixed-signal vector X(f, x) into a linear form so that its covariance matrix 

becomes equal to an identity matrix. 

[0147] For that purpose, first the whitening matrix generating section 
20 1 15 la uses frequency-domain mixed signals X q (f, x) to generate a whitening 
matrix V(f) which transfers a mixed-signal vector X(f, x) into a whitening 
space (step S 1 5 1). In this example, the whitening matrix generating section 
1151a reads the mixed signals X q (f, x) from memory area 1 102 of the memory 
1 100, computes V (f) =R(f)- ,/2 , where R(f) = < X(f, x>X(f, x) H > x, as a 
25 whitening matrix V(f), and stores it in memory area 1112. Here, <*> T 
represents the time-averaged vector of a vector "*", M * H " represents the 
complex conjugate transposed matrix of the vector "*", R' 1/2 represents a 
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matrix that satisfies R 1/2 R(R 1/2 ) H = I (where I is the identity matrix). A 
typical method for calculating the whitening matrix V(f) is to decompose R(f) 
into eigenvalues as R(f) = E(f>D(f)-E(f) H (where E(f) is an unitary matrix and 
D(f) is a diagonal matrix) and calculate V(f) = D(f)" 1/2 -E(f) H . Here, D(f)" 1/2 is 
5 equivalent to a diagonal matrix obtained by raising each element of the 

diagonal matrix D(f) to the (-l/2)-th power and therefore can be calculated by 
raising each element to the (-l/2)-th power. 

[0 148] Then, the whitening section 1151b uses the whitening matrix V(f) 
to map the mixed-signal vector X(f, x) to the whitening space to obtain a 

10 whitened mixed-signal vector Z(f, x) and map the basis vector Ai (f) (f) to the 
whitening space to obtain a whitened basis vector B I(f) (f) (step SI 52). In this 
example, the whitening section 1151b first reads mixed signals X q (f, x) from 
memory area 1 1 02 of the memory 1 1 00, the basis vectors A I(f) (f) 
corresponding to selection signals Y 1(f )(f, x) from memory area 1111, and the 

15 whitening matrix V(f) from memory area 1112. The whitening section 
1151b then calculates a whitened mixed-signal vector Z(f, x) using the 
operation Z(f, x) = V(f>X(f, x), calculate a whitened basis vector Bi (f) (f) using 
the operation B I(f) (f) = V(f>Ai (f) (f), and then stores them in memory area 1112 
of the memory 1100. 

20 [0149] Then, the angle computing section 1151c computes the angle 0i(f)(f, 
x) between the whitened mixed-signal vector Z(f, x) and the whitened basis 
vector B I(f )(f) for each time-frequency (step S 1 53). In this example, the 
angle computing section 1151c first reads the whitened mixed-signal vector 
Z(f, x) and the whitened basis vector Bi (f )(f) from memory area 1 1 12 of the 

25 memory 1 100. The angle computing section 1151c then calculates the angle 
0 I( f)(f, x) in each time-frequency slot as 

e I(f) (f, t) = coB- ! (|B TO H (f)-Z(f, x)|/ 1| B 1(0 (f) || • || Z(f, x) || ) ... (45) 
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and stores it in memory area 1112. In Equation (45 ), |*| represents the 
absolute value of a vector "*" and || * || represents the norm of the vector "*'\ 
[0150] Then, the function operation section 1 15 Id generates a 
time-frequency mask M(f, x), which is a function including the angle 0i(f)(f 5 x) 
5 as an element (step SI 54). In this example, the function operation section 
1 1 5 Id first reads real-number parameters 0 T and g from memory area 1 108 of 
the memory 1 100 and the angle 0i(f)(f, x) from memory area 1112. The 
function operation section 115 Id then calculates a logistic function 

M(9Cf ' T)) ° l + e«-W f •')-*■» - (46) 

10 as the time-frequency mask M(f, x). The real-number parameters 0 T and g 
are parameters that specify the turning point and gradient, respectively, of the 

time-frequency mask M(f, x), and are stored in memory area 1108 during 
preprocessing. Fig. 24A shows an exemplary time-frequency mask M(f, x) 
calculated using the two real-number parameters 0 T , and g according to 

15 Equation (46). As shown, the smaller the real-number parameter 0 T? the 
narrower the area where the time-frequency mask M(f, x) takes on a large 
value (1 in this example). This appears in the tendency that, as the value of 
the real-number parameter 0 T decreases, the quantity of interfering signal 
components remaining in the masked selection signal Y I(f ) , (f 5 x) decreases but 

20 at the same time the masked selection signal Y^f^f, x) becomes unnatural. 

For example, if the target signal is a speech signal, musical noise increases as 
the value of the real-number parameter 0 T decreases. Furthermore, the 
waveform of the time-frequency mask M(f, x) (transition from a large value (1 
in this example) to a small value (0 in this example)) steepens with increasing 

25 value of the real-number parameter g. To minimize interfering signal 

components remaining in the masked selection signal Y I( f) ! (f, x) while keeping 
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the masked selection signal Y\(f)(f, x) natural, it is desirable to smoothen the 
waveform of the time-frequency mask M(f, x) by minimizing the value of the 
real-number parameter g. 

[0151] Values of the real-number parameters 9 T and g may be set for each 
5 frequency. An additional real-number parameter a may be introduced and 
the logistic function 

M(e ( f,x))= i + eS . (e ^ T) _ 6T) ...(47) 

may be used as the time-frequency mask M(f, x). Any other function may be 
used as the time-frequency mask M(f, x) that takes on a larger value in a 
10 region where the angle 9i(f)(f> x) is close to 0 and takes on a smaller value in a 
region where the angle 0i(f)(f, x) is large, that is, 0 < M(0(f, x)) < 1 . (End of 
the detailed description of step SI 04 (processing by the mask generating 
section 1151) 

[0152] [Details of step SI 05 (processing by the masking section 1152)] 
15 The masking section 1 152 reads the selection signal Yi ( f)(f, x) from 

memory area 1111 of the memory 1 100 and the time-frequency mask M(f, x) 
from memory area 1112. The masking section 1152 then calculates a 

masked selection signal Y^f^f, x) as 

Y I(f) '(f, x) = M(f, x>Y I(f) (f, x) ... (48) 
20 and stores it in memory area 1113 of the memory 1 100. (End of the detailed 
description of step SI 05 (processing by the masking section 1 152)) 
[0153] [Effects of the time- frequency masking] 

Effects of the time-frequency mask M(f, x) described above will be 
described next. 

25 If the sparseness of signal sources is so high that the signal sources 

S|c(f, x) is likely to approach 0, Equation (4) can be approximated as 
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[0154] [Formula 40] 

X(f , x) « H k (f ) -S k (f,x) ,k e {1,...,N} ... (49) 

where k is the suffix associated with each signal source and is determined by 

each time-frequency position (f, x). Accordingly, in a time-frequency 

5 position (f, x) where only or practically only the target signal is active, the 

whitened mixed vector Z(f, x) can be approximated as 

[0155] [Formula 41] 

Z(f ,x) * V(f) • H 1(f) (f ) • S I(f) (f ,x) « V(f ) • A I(f) (f ) • Y I(f) (f ,x) 

where Yi ( f)(f, t) is a scalar. As mentioned above, the whitened basis vector 
10 B 1(0 (f) is 

B I(f) (f) = V(f> A I(£) (f) ...(50) 
It can be seen from the foregoing that the angle 0i(f)(f, t) between a whitened 
mixed-signal vector Z(f, x) and a whitened basis vector Bi(f)(f) approaches 0 at 
a time-frequency position (f, t) where only or practically only the target 

■ 

15 signal is active. As stated above, the time-frequency mask M(f, x) takes on a 
larger value in a region where the angle 0i(f)(f, x) is closer to 0. Therefore, 
the time- frequency mask M(f, x) extracts a selection signal Y I( f)(f, x) at a 
time-frequency position (f, x) where only or practically only the target signal 
is active as a masked selection signal Yi ( f/(f> x) (see Equation (48)). 

20 [0156] On the other hand, if 1(f) = 1 , the whitened mixed-signal vector Z(f, 
x) in a time-frequency position (f, x) where the target signal is almost inactive 
can be approximated as 
[0157] [Formula 42] 

Z(f^)«Zk=2 V ( f )-H k (f).S k (f,x) ... (51) 

25 Here, if the number N of signal sources is equal to or less than the number M 
of sensors, vectors V(f)-Hi(f), V(f>H k (f) in a whitening space are 
orthogonal to each other. S k (f, x) in Equation (5 1) is a scalar value. Thus, 



■77- 

it can be seen that the angle 9i (f )(f, x) between the whitened mixed-signal 
vector Z(f, x) and the whitened basis vector Bi (f )(f) increases. If N > M, the 
whitened basis vector Bj(f) (1(f) =1) tends to form a large angle with vectors 
V(f)-H 2 (f), V(f)-H k (f) other than the target signal. It can be seen from the 
5 foregoing that the angle 6i ( f)(f, x) takes on a large value at a time-frequency 
position (f, x) where the target signal is almost inactive. Because the 
time- frequency mask M(f, x) takes on a small value in a region where the 
angle 0i(f)(f, x) is far from 0, the time-frequency mask M(f, x) excludes a 
selection signal Y 1 (f ) (f ) x) at a time- frequency position (f, x) where the target 
10 signal is almost inactive from a masked selection signal Yi(f/(f, x) (see 
Equation (28)). 

[0158] It can be seen from the foregoing that the time-frequency masking 
using the time-frequency mask M(f, x) further suppresses interfering signal 
components remaining in the selection signal Yi (f )(f, x). 

15 The time-frequency masking is effective especially for signals 

having sparseness such as speech or music. Less sparse signals contain a 
large quantity of other interfering signal components even in a time-frequency 
position (f, x) where a target signal is active, therefore the approximation by 
Equation (49) cannot hold and the angle G I(f) (f, x) will be far from 0. That is, 

20 if a signal is not sparse, vectors V(f)-H 2 (f) and V(f)-H 3 (f) corresponding to 
interfering signals exist together with the vector V(f)-Hi(f) corresponding to 
the target signal (1(f) = 1) in a time-frequency position (f, x) as shown in Fig. 
24B, for example. In this example, the whitened mixed-signal vector Z(f, x) 
is 

25 [0159] [Formula 43] 

Z(f ,x) * ZLi W ) ■ H k (f) • S k (f ,x) ... (52) 

Therefore, the angle 0 f(O (f, x) between the whitened mixed-signal vector Z(f, 
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x) and the whitened basis vector Bi(f)(f) is also far from 0. This shows that a 
signal at a time-frequency position (f, x) where the target signal is active can 
be excluded from masked selection signals Y I( f) f (f, x). 

The time-frequency masking is also especially effective in a case 
5 where the power of a target signal is sufficiently large compared with that of 
interfering signals. That is, even in a situation where sparseness is low and 

other interfering signal components exist at a time-frequency position (f, x) 
where the target signal is active, the approximation by Equation (49) is 
relatively likely to hold and the angle 0i(f)(f, x) approaches 0 if the power of 

10 the target signal is sufficiently large compared with that of the interfering 
signals. For example, if the power of the target signal is sufficiently large 
compared with the power of interfering signals, the contribution of the 
interfering signals in Equation (52) is low and the angle 0i(f>(f, x) between the 
whitened mixed-signal vector Z(f, x) and the whitened basis vector Bj(f)(f) 

15 approaches 0. This shows that the possibility that the signals at 

time-frequency position (f, x) where the target signal is active will be 
excluded from the masked selection signals Y^^f, x) can be decreased. It 
also means that interfering signal components remaining in the masked 
selected signal Y I(f ) f (f, x) can be reduced to a relatively low level. (End of 

20 detailed description of Step SI 05 (processing by the masking section 1 152) 
[0160] [Fourth embodiment (Example of the second aspect of the invention)] 

The fourth embodiment of the present invention will be described 

below. 

The fourth embodiment is a variation of the third embodiment and 
25 is the same as the third embodiment except that time-frequency masking 
using a time-frequency mask is not performed. The following description 
will focus on differences from the third embodiment and the description of the 
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same elements as those in the third embodiment will be omitted. 
<Configuration> 

Fig. 25 is a block diagram showing an exemplary signal separating 
apparatus 1200 according to the fourth embodiment. 
5 [01.61] As shown in Fig. 25, the signal separating apparatus 1200 of the 
fourth embodiment differs from the signal separating apparatus 1001 in that 
the memory 1 100 does not include memory areas 1112 and 1113 and the 
time- frequency masking section 1 150 is not provided. 
<Processing> 

10 Processing performed in the signal separating apparatus 1200 

according to the fourth embodiment will be described below. 

Fig. 26 is a flowchart illustrating processing performed in the signal 
separating apparatus 1200 according to the fourth embodiment. The 
following description focuses on differences from the third embodiment. 

15 [0162] First, as in the third embodiment, a frequency domain transforming 
section 1 120 reads time-domain mixed signals x q (t) from memory area 1101 
of a memory 1 100. The frequency domain transforming section 1 120 then 
transforms them into frequency-domain mixed signals X q (f, x) using a 
transformation such as a short-time Fourier transformation and stores them in 

20 memory area 1 1 02 of the memory 1 1 00 (step S 1 6 1 ). 

Then, a signal separating section 1130 reads the frequency-domain 

i 

mixed signals X q (f, x) from memory area 1 102 of the memory 1 100. The 
signal separating section 1 130 in this example applies independent component 
analysis (ICA) to a mixed-signal vector X(f, x) = [Xi(f, x), X M (f, x)] T 
25 consisting of the read mixed signals X q (f, x) to calculate a separation matrix 
of M rows and M columns W(f) = [W,(f), W M (f)] H (where "* H " is the 
complex conjugate transposed matrix of a matrix "*") and a separated signal 
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vector Y(f, x) = W(f) X(f, x) for each frequency f (step S 1 62). The 
calculated separation matrix W(f) is stored in memory area 1 103 of the 

memory 1 100. The separated signals Y p (f, x) (p e {1, M}) constituting 
the separated signal vector Y(f, t) = [Yj(f, x), Y M (f, t)] are stored in 
5 memory area 1 1 07. 

[0163] Then, a target signal selecting section 1 140 reads the separation 
matrix W(f) from memory area 1 103 of the memory 1 100, normalizes basis 
vectors which are rows of its generalized inverse matrix, and clusters the 
normalized basis vectors. The target signal selecting section 1 140 then 

10 selects selection signals Y I(f )(f, x) from the separated signals in memory area 
1 107 of the memory 1 1 00 for each frequency using the variance of the 
clusters as the reference and stores them in memory area 1111 of the memory 
1100 (step SI 63). 

Then, a time domain transforming section 1160 reads the selected 

15 separated signals Yi (f )(f, x) from memory area 1111 of the memory 1 100 and 
applies a transformation such as a short-time inverse Fourier transformation to 
them to generate time-domain separated signals y(t), and stores them in 
memory area 1 1 14 of the memory 1 100 (step SI 64). 

[0164] [Fifth embodiment (Example of the second aspect of the invention)] 
20 The fifth embodiment of the present invention will be described 

below. 

The fifth embodiment is a variation of the third embodiment. The 
only difference from the third embodiment is the method for generating a 
time- frequency mask. The following description will focus on differences 
25 from the third embodiment and description of the same elements as those in 
the third embodiment will be omitted. . 
<Configuration> 



-81- 

Fig. 27 is a block diagram showing an exemplary signal separating 
apparatus 1300 according to the fifth embodiment. Fig. 28 A is a block 
diagram showing a detailed configuration of a time-frequency masking 
section 1350 shown in Fig. 27. Fig. 28B is a block diagram showing a 
5 detailed configuration of a mask generating section 1351 shown in Fig. 28A. 
In these drawings, the same elements as those in the third embodiments are 
labeled with the same reference numerals used in the drawings of the third 
embodiment. 

[0165] As shown in Fig. 27, the signal separating apparatus 1300 of the 

10 fifth embodiment differs from the signal separating apparatus 1001 in that the 
signal separating apparatus 1 3 00 has the time-frequency masking section 
1350 instead of the time-frequency masking section 1 150, and that the 
memory 1 100 has memory areas 1308 and 1312 instead of memory areas 
1 108 and 1112. As shown in Fig. 28 A, the time-frequency masking section 

15 1350 includes the mask generating section 1351 and the masking section 1152. 
As shown in Fig. 28B, the mask generating section 1351 includes a frequency 
normalizing section 1351a, a norm normalizing section 1351b, a centroid 
extracting section 1351c, a squared-distance computing section 135 Id, and a 
function generating section 135 le. The frequency normalizing section 

20 1351a includes a first normalizing section 135 laa and a second normalizing 
section 135 lab. The centroid extracting section 1351c includes a centroid 
selecting section 135 lea and a norm normalizing section 135 led. 
[0 1 66] <Mask generation> 

The fifth embodiment differs from the third embodiment only in 

25 time-frequency mask generation (step SI 04). The time-frequency mask 
generation of the fifth embodiment will be described below. 

Fig. 29 is a flowchart illustrating a process for generating a 
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V 

time-frequency mask according to the fifth embodiment. Fig. 30 A is a 
flowchart illustrating details of step SI 71 in Fig. 29. Fig. 30B is a flowchart 
illustrating details of step SI 72 in Fig. 29. Referring to these flowcharts, the 
time-frequency mask generating process will be described below. 
5 [01 67] First, the frequency normalizing section 1 3 5 1 a of the mask 

generating section 1351 normalizes a mixed-signal vector X(f, x) consisting 
of frequency-domain mixed signals X q (f, x) stored in memory area 1 102 of 
the memory 1100 to a frequency-normalized vector X'(f, x) that is 
independent of frequency (frequency normalization) and stores the elements 

10 X q f (f, x) of the frequency-normalized vector X f (f> x) in memory area 1312 of 
the memory 1100 (step S171). 
[Details of frequency normalization (step SI 71)] 

The frequency normalization (step S171) will be detailed below. 
[0168] First, a control section 1170 (Fig. 27) assigns 1 to parameter q and 

15 stores it in a temporary memory 1180 (step S 1 8 1 ). Then, the frequency 

normalizing section 1351a (Fig. 28B) reads parameters d, c, and Q described 
earlier from memory area 1308 of the memory 1 100, reads the elements Xq(f, 
x) of the mixed-signal vector X(f, x) corresponding to each (f, x) from 
memory area 1 1 02, and reads the parameter q from the temporary memory 

20 1 1 80. The frequency normalizing section 1351a then calculates 
[0169] [Formula 44] 



X '(f,T)=|X Q (f,x)|exp 



, arg[X q (f,x)/X Q (f,x)] 
J 4fc _1 d 



... (53) 



and stores the result in memory area 1312 of the memory 1 100 as each 
element of a frequency-normalized vector X f (f, x) = [Xi f (f, t), X M '(f 5 T )] T 
25 (step SI 82). Here, arg[ ] represents an argument and j represents an 
imaginary unit. 
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In particular, the first normalizing section 1351aa of the frequency 
normalizing section 1351a normalizes the argument of each element Xq(f, x) 
of a mixed-signal vector X(f, x) by using one particular element X q (f, x) of the 
mixed-signal vector X(f, x) as a reference according to the following 
5 operation. 

[0170] [Formula 45] 

X q "'(f,T)=|X q (f,x)|exp{j.arg[X q (f,x)/X Q (f,x)]} ...(54) 

Then, the second normalizing section 135 lab of the frequency 
normalizing section 1351a divides the argument of each of the elements X q " f (f> 
10 x) normalized by the first normalizing section 135 laa by a value 4fc -1 
proportional to the frequency f, as follows. 
[0171] [Formula 46] 

X q '(f,x)=| X q '"(f,x)|exp 

Then, the control section 1 170 determines whether the value of 
15 parameter q stored in the temporary memory 1 180 satisfies q = M (step SI 83). 
If not q = M, the control section 1 1 70 sets a calculation result q + 1 as a new 
value of the parameter q, stores it in the temporary memory 1 180 (step SI 84), 
and then returns to step S 1 82. On the other hand, if q = M, the control 
section 1170 terminates processing at step SI 71 and causes processing at step 
20 SI 72, described below, to be performed. (End of the detailed description of 
the frequency normalization (step SI 71) 

[0172] Then, the norm normalizing section 135 lb of the mask generating 
section 1351 normalizes a frequency-normalized vector X'(f> x) consisting of 
the elements Xq f (f, x) stored in memory area 13 12 of the memory 1 100 to a 
25 norm-normalized vector X"(f, x) whose norm has a predetermined value (1 in 
this example) (norm normalization) and stores the elements X q "(f> t) in 



.arg[X'"(f,x)] 



J 



4fc~ 1 d 



... (55) 
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memory area 1312 (step SI 72). 

[Details of norm normalization (step SI 72)] 

The norm normalization (step SI 72) will be detailed below. 
[0173] First, the norm normalizing section 135 lb (Fig. 28B) reads the 
5 frequency-normalized vectors X'(f, x) = [X\(f, x), X M '(f, t)] t each of which 
corresponds to (f, x), from memory area 13 12 of the memory 1 100. The 
norm normalizing section 1351b then calculates their norms || X'(f, t) || as 
[0174] [Formula 47] 

IX^f^T^^lCXq^f^)) 2 

10 and stores the frequency-normalized vectors X'(f, x) and the norms || X'(f, x) || 
in the temporary memory 1 180 (step SI 85). 

Then, the norm normalizing section 1351b reads the 
frequency-normalized vector X f (f> x) corresponding to each (f, x) and its norm 
|| X f (f, x) || from the temporary memory 1180 and calculates a 
15 norm-normalized vector X"(f, x) as 

X"(f, x) = X'(f, x)/ 1| X'(f, x) || 
(step S 1 86). 

[0175] The calculated norm-normalized vector X"(f, x) is stored in 
memory area 1312 of the memory LI 00. With this, step SI 72 ends. (End 

20 of the detailed description of the norm normalization (step S 1 72)) 

Then, a centroid selecting section 135 lea of a centroid extracting 
section 1351c reads cluster selection information i from the temporary 
memory 1 180 (see step S141) and reads a centroid r| t corresponding to the 
cluster selecting information i from memory area 1 1 10 of the memory 1 100 

25 (step SI 73). Then, the norm normalizing section 135 leb normalizes the 
norm of the centroid read by the centroid selecting section 135 lea to a 
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predetermined value (the value at step SI 72, which is 1 in this example). 
The centroid r| l after norm normalization is referred to as a norm-normalized 
centroid r\ x % (step SI 74). The procedure for norm normalization is the same 
as the procedure at steps SI 85 and SI 86. The norm-normalized centroid r\ x 
5 is stored in memory area 13 12 of the memory 1 100. 

[0176] Then, the squared distance computing section 135 Id reads the 
norm-normalized vector X"(f, t) and the norm-normalized centroid t\ x from 
memory area 1 3 1 2 of the memory 1100 and computes the squared distance 
between them as 

10 DS(f,x)= ||r,/-X"(f,T)|| 2 

(step SI 75) and stores the squared distance DS(f, x) in memory area 1312. 

Then, the function generating section 135 le reads the squared 
distance DS(f, t) from memory area 1312 of the memory 1100, uses a 
function having the squared distance DS(f, t) as its variable to generate a 

15 time-frequency mask M(f, x), and stores it in memory area 1312 of the 
memory 1 100 (step SI 76). In particular, the function generating section 
135 le reads real-number parameters g and D T from memory area 1308 of the 
memory 1 100 and generates a time-frequency mask M(DS(f, x)), which is a 
logistic function as given below. Here, the parameter D T has been stored 

20 previously in memory area 1308 and ,f e" is Napier's number. 
[0177] [Formula 48] 

M(DS(f,x))= i + eg . (DS ' (ft) . DT) ...(56) 

The time-frequency mask M(DS(f, x)) thus generated is used in 
masking in the masking section 1152 as in the third embodiment. 
25 [Experimental results] 

In order to demonstrate effects of the third and fourth embodiments, 
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experiments were conducted to enhance and extract main speech emitted near 
microphones. In the experiments, impulse responses h qk (r) were measured 
under the conditions shown in Fig. 3 1 A. Loudspeakers were arranged in 
such a manner that a cocktail party situation is simulated. Furthermore, the 
5 sound levels of all loudspeakers were set to approximately equal values so 
that a particular loudspeaker did not output sound at a significantly higher 
volume level than the others. Mixing at the microphones was generated by 
convolving English speech sampled at 8 kHz for 6 seconds with measured 
impulse responses. The microphones were arranged three-dimensionally as 

10 shown in Fig. 3 1 A. A system (apparatus) containing the signal separating 
apparatus was supplied with only information about the maximum distance 
(3.5 cm) between the reference microphone (Mic.2) and other microphones 
but not with further information about the layout of the microphones. In 
each experiment, one of the four loudspeaker positions (a 120, bl20, cl20, and 

15 cl70) near the microphones was selected as a target sound source and the 

other three loudspeakers were kept silent. Six loudspeakers distant from the 
microphones were outputting interfering sounds at all times during the 
experiments. The results of the extraction were evaluated on the basis of 
improvements in the signal-to-interference ratio, Input SIR-Output SIR. 

20 Greater values mean better extraction of a target speech and therefore higher 
levels of suppression of the other interfering sounds. The two kinds of SIR 
are defined by 
[0178] [Formula 49] 

Inpu tS IR = 101og, 0 <E>"<r)-».(t-rfo (dB) 

<|S k ^iSr h ik(0-s k (t-r)| > t 
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OutputSIR = 101og 10 fc U " (r),S ' (t - r f\ (dB) 

<|E k ^iZr u ik( r )-s k (t-r)| > t 

where 

«ik (0 = Sjli 5iJ w lq (x) • h qk (r - T) 

is an impulse response from s k (t) to yi(t). 
5 Sixteen combinations, each consisting 7 speeches (1 target speech 

and 6 interfering speeches), were created for each target sound source position 
for the experiments. Fig. 3 IB is a table showing average improvements in 
SIR in the case where only ICA was used (the fourth embodiment) and in the 
case where both ICA and time-frequency masking were used (the third 

10 embodiment). Generally good improvements in SIR were yielded, with 

slight variations depending on the positions of the target sound sources. The 
good results were obtained at positions a 120 and bl20 because the interfering 
sounds came from different positions. From a two-dimensional perspective, 
positions cl20 and cl70 appear to be positions where it is difficult to extract 

15 the target speech because many interfering sounds come from the same 

direction. Actually, the results obtained at a position cl70 were excellent, 
however. This is because position cl70 was placed at a height different from 
those of interfering sounds and the system automatically uses the difference in 
height to extract signals with the three dimensionally arranged microphones. 

20 The table in Fig. 3 IB shows that the performance is improved by the 

time-frequency masking. Three parameters shown in Fig. 3 1 A were used in 
Equation (46) that determines a time-frequency mask. By using smaller 
values of Q T , greater SIR improvements are achieved. However, some of 
sounds obtained using smaller 0 T were accompanied by unnatural noise 
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(musical noise). The experiments showed that parameters (0 t , g) = (0.33371, 
20) sufficiently suppressed interfering sounds while providing natural sound. 
[0179] [Variations] 

The present invention is not limited to the third to fifth 
5 embodiments described above. For example, while the signal separating 
section 1130 computes a separation matrix W(f) consisting of M rows and M 
columns in the embodiments described above, it may compute a non-square 
separation matrix W(f) such as a matrix consisting of N rows and M columns. 
In that case, basis vectors are the columns of a generalized inverse matrix 
10 W*(f) (for example, a Moore-Penrose generalized matrix) of the separation 
matrix W(f). 

While a time-frequency mask is used to further suppress interfering 
signal components in selection signals Yj^f, t) to generate masked selection 
signals Y I(i ) f (f 5 x) in the third embodiment, any other method may be used to 
15 suppress interfering signal components to generate masked selection signal 
Yi(f)'(f, x). For example, if there are only two signal sources, a 
time-frequency mask may be generated that compares the magnitude of 
extracted separated signals Y^f, t) and Y 2 (f, x), and extracts Y^f, x) as the 
masked selection signal Y I( f/(f, x) if |Yi(f, x)| > |Y 2 (f, x)|, or extracts the signal 
« 20 Y 2 (f, x) as the masked selection signal Y I(f) '(f, x) if |Yi(f, x) < |Y 2 (f, x)|. Then, 
vectors consisting of the separated signals Yj(f, x) and Y 2 (f, x) is multiplied 
by the generated time-frequency mask. 

[0180] While the signal separating section 1130 uses independent 
component analysis (ICA) to compute a separation matrix and separated 
25 signals in the third embodiment, it may use a time-frequency mask (which is a 
mask for each time frequency, for example a binary mask that takes on the 
value 1 or 0) to extract separated signals from observed signals (for example 
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see O. Yilmaz and S. Richard, "Blind separation of speech mixtures via 
time-frequency masking," IEEE Trans, an SP. vol. 52, no. 7, pp. 1830 -1847, 
2004) and may generate a separation matrix from the result. 

The first normalizing section 1142aa of the frequency normalizing 
5 section 1 142a in the third embodiment normalizes the arguments of the 
components A qp (f) of a basis vector A p (f) by using one particular element 
A Qp (f) of that basis vector A p (f) as the reference according to Equation (15), 
which is a part of Equation (35). However, the first normalizing section 
1 142aa may use a particular element A Qp (f) of a basis vector A p (f) as the 

10 reference to normalize the arguments of the components A qp (f) of that basis 
vector A p (f) according to Equations (27-1) to (27-3) described above. 
[0181] Furthermore, the frequency normalizing section 1142a may 
perform frequency normalization by calculating Equations (28-1) to (28-4) 
given above, instead of Equation (35). 

15 While the norm normalizing section 1 142b performs normalization 

such that a norm has a value of 1 in the third embodiment, it may perform 
normalization such that a norm has a predetermined value other than 1 . 
Furthermore, the norm normalizing section 1142b is not provided and 
therefore norm normalization may be omitted. In this case, clustering is 

20 performed on the basis of the similarity in the directions of vectors as 
described above. 

[0182] The same value of parameter d may be set for all sensors q or 
different values may be set for different sensors q. For example, the distance 
between the reference sensor and a sensor q may be set the value of parameter 
25 d at the sensor q. 

[Sixth embodiment (example of the third aspect of the invention)] 

The sixth embodiment of the present invention will be described 
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below. 

The sixth embodiment uses the principles described above and uses 
information obtained from all observed signals in a simple and efficient 
manner to perform signal separation without needing precise positional 
5 information about sensors. In the sixth embodiment, a "mixed-signal vector' 1 
which will be described later corresponds to the "complex vector" described 
above. 

[0183] <Configuration> 

Like the signal separating apparatus of the first embodiment, a 
10 signal separating apparatus 2001 of the sixth embodiment is configured by 

loading a signal separating program into a computer of well-known von 

Neumann-type. Fig. 32 is a block diagram showing an exemplary 

configuration of the signal separating apparatus 2001 in the sixth embodiment. 

Fig. 33 is a block diagram illustrating details of a signal separating section 
15 2120 shown in Fig. 32. The solid arrows in Figs. 32 and 33 represent actual 

data flows and the dashed arrows represent theoretical information flows. 

Arrows representing flows of data inputted in and outputted from a control 

section 2140 are omitted from Figs. 32 and 33. 

[0184] As shown in Figs. 32 and 33, the signal separating apparatus 2001 
20 includes a memory 2100, a frequency domain transforming section 2110 
(including the functions of the "complex vector generating section"), the 
signal separating section 2120, a time frequency transforming section 2130, 
and the control section 2140. The signal separating section 2120 includes a 
frequency normalizing section 2121 (constituting the "normalizing section"), 
25 a norm normalizing section 2122 (constituting the "normalizing section"), a 
clustering section 2123, and a separated signal generating section 2124. The 
frequency normalizing section 2121 includes a first normalizing section 
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2 1 2 1 a and a second normalizing section 2121b. The control section 2 1 40 
has a temporary memory 2141. 

[0185] The memory 2 1 00 and the temporary memory 2141 correspond to 
storage such as a register lOac, an auxiliary storage device lOf, and a RAM 
5 1 Od. The frequency domain transforming section 2110, the signal separating 
section 2120, the time domain transforming section 2130, and the control 
section 2140 are configured when an OS program and the signal separating 
program are read in the CPU 10a and the CPU 10a executes them. 
<Processing> 

10 Processing performed in the signal separating apparatus 200 1 will 

be described below. In the following description, a situation will be dealt 
with in which N source signals are mixed and observed by M sensors. The 
assumption is that mixed signals X q (t) (q = 1, M) in the time domain 
observed at the sensors are stored in memory area 2101 of the memory 2100 

15 and signal transmission speed c, reference values Q and Q' selected from 
natural numbers less than or equal to M (each being the suffixes indicating 
reference sensors selected from among the M sensors) and values of 
real-number d parameters are stored in memory area 2105. 
[0 1 86] Fig. 34 is a flowchart outlining whole processing in the signal 

20 separating apparatus 2001 according to the sixth embodiment. The 

processing by the signal separating apparatus 2001 of the sixth embodiment 
will be described with reference to the flowchart. 
[Overview of processing] 

First, the frequency domain transforming section 2110 reads mixed 

25 signals X q (t) in the time domain from memory area 2 1 0 1 of the memory 2 1 00, 
transforms them into time-series signals of individual frequency (referred to 
as "frequency- domain mixed signals) Xq(f, x) (q = 1, M and f = 0, f s /L, 
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f s (L - 1)L, where f s is a sampling frequency) by applying a transformation 
such as a short-time discrete Fourier transformation, and stores them in 
memory area 2102 of the memory 2100 (step S201). 
[0187] Then, the frequency normalizing section 2121 of the signal 
5 separating section 2120 reads the frequency-domain mixed signals X q (f, x) 
from memory area 2102 of the memory 2100. After reading the 
frequency-domain mixed signals X q (f, x), the frequency normalizing section 
2121 normalizes a mixed-signal vector X(f, x) = [Xi(f, x), X M (f, t)] t 
consisting of those signals into a frequency-normalized vector X f (f, x) that is 

* 

10 independent of frequency f (step S202). The generated 

frequency-normalized vectors X'(f, x) are stored in memory area 2 1 03 of the 

memory 2100. Details of step S202 will be described later. 

[0188] Then, the norm normalizing section 2122 of the signal separating 

section 2120 read the frequency-normalized vectors X'(f, x) from memory 
15 area 2103 of the memory 2100 and normalizes them into a norm-normalized 
vectors X"(f, x) whose norm has a predetermined value (for example 1). The 
norm normalizing section 2122 then stores the generated norm-normalized 
vectors X"(f, x) in memory area 2104 of the memory 2100 (step S203). 
Details of this operation will be described later. 
20 Then, the clustering section 2123 of the signal separating section 

2120 reads the norm-normalized vectors X"(f, x) from memory area 2104 of 
the memory 2100, clusters them and generates clusters. The clustering 
section 2123 then stores cluster information identifying each cluster 
(information identifying the members X !t (f, x) of the k-th cluster (k = 1, N), 
25 in memory area 2106 of the memory 2100 (step S204). Details of this 
operation will be described later. 

[0189] Then, the separated signal generating section 2124 of the signal 
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separating section 2120 reads the cluster information Ck and the reference 
value Q' from memory areas 2106 and 2105, respectively, of the memory 
2100. The separated signal generating section 2124 then uses the cluster 
information Ck and the reference value Q f to extract from memory area 2120 
5 the Q f -th element X Q f (f, x) of the mixed-signal vector X(f, x) corresponding to 
the norm-normalized vector X"(f, x) belonging to the k-th cluster and 
generates a separated signal vector Y(f, x) having the element as its k-th 
element Y k (f, x). The separated signal generating section 2124 then stores 
the generated separated signal vector Y(f, x) in memory area 2107 of the 

10 memory 2100 (step S205). Details of this operation will be described later. 
[0190] Finally, the time domain transforming section 2130 reads the 
separated signal vector Y(f, x) from memory area 2107 of the memory 2100 
and transforms each of its separated signal components Y k (f, x) by using a 
transformation such as a short-time inverse Fourier transformation into a 

15 time-domain separated signal Y k (t) for each suffix k. The time domain 

transforming section 2130 then stores the transformed, time-domain separated 
signals y k (t) in memory area 2108 of the memory 2100 (step S206). 

Details of the operations will be described below. 
[Details of processing by the frequency normalizing section 2121 and the 

20 norm normalizing section 2 1 22] 

The frequency normalizing section 2121 and the norm normalizing 
section 2122 normalize all mixed-signal vectors X(f, x) = [Xj(f, x), X M (f, 
x)] (f = 0, f s /L, f s (L - 1)/L) to norm-normalized vectors X"(f, x) that are 
independent of frequency but dependent only on the positions of signal 

25 sources. This normalization ensures that each cluster formed by clustering at 
step S204 corresponds only to a signal source. If this normalization is not 
properly performed, clusters are not formed. As described earlier, 
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normalization in the sixth embodiment consists of frequency normalization 
and norm normalization. The frequency normalization is performed by the 
frequency normalizing section 2121 to normalize mixed-signal vectors X(f, t) 
into frequency-normalized vectors X f (f, t) that are independent of frequency. 
5 The norm normalization is performed by the norm normalizing section 2122 
to normalize the frequency-normalized vectors X'(f, x) into norm-normalized 
vectors X"(f, x) whose norm has a predetermined value (1 in this example). 
These normalizations will be detailed below. 

[0191] [Details of processing by the frequency normalizing section 2121 

10 (processing at step S202)] 

Fig. 35 A is a flowchart illustrating details of processing at step 
S202 shown in Fig. 34. With reference to the flowchart, details of 
processing at step S202 will be described below. 

First, the control section 2140 (Fig. 32) assigns 1 to parameter q and 

15 stores it in the temporary memory 2141 (step S21 1). Then, the frequency 
normalizing section 2121 (Figs. 32 and 33) reads the parameters d, c, and Q 
described earlier from memory area 2105 of the memory 2100, reads the 
element X q (f, t) of the mixed-signal vector X(f, t) corresponding to each (f, x) 
from memory area 2102, and reads parameter q from the temporary memory 

20 2141. The frequency normalizing section 2121 then calculates 
[0192] [Formula 50] 



X'(f,x) =| X Q (f,x)|exp 



. arg[X q (f,x)/X Q (f,x)] 
J 4fc" , d 



... (60) 



and stores the result in memory area 2103 of the memory 2100 as the 
components of a frequency-normalized vector X f (f, x) = [Xi'(f, x), X M f (f 5 

T 

25 x)] (step S212). Here, arg[-] represents an argument and j represents an 
imaginary unit. 
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In particular, the first normalizing section 2121a of the frequency 
normalizing section 2121 first normalizes the argument of each component 
X q (f, t) of the mixed-signal vector X(f, t) on the basis of a particular element 
X q (f, x) of the mixed signal vector X(f, t) by the following operation: 
5 [0193] [Formula 51] 

X q "'(f ,x) =| X q (f ,x) | exp{j . arg[X q (f ,x)/X Q (f ,x)]} .... (61) 

Then, the second normalizing section 2121b of the frequency 
normalizing section 2121 divides the argument of each element X q m (f, x) 
normalized by the first normalizing section 2121a by a value 4fc" 1 d 
10 proportional to frequency f as given below. 
[0194] [Formula 52] 

X q , (f 3 x)=|X q m (f 5 x)|exp 

Then, the control section 2140 determines whether the value of 
parameter q stored in the temporary memory 2141 satisfies q = M (step S213). 
15 If not q = M, the control section 2140 sets a calculation result q + 1 as a new 
value of parameter q, stores it in the temporary memory 2141 (step S214), and 
then returns to step S212. On the other hand, if q = M, the control section 
2140 terminates step S202, and causes step S203 to be executed. 
[0195] [Details of processing by the norm normalizing section 2122 (details 
20 of step S203)] 

Fig. 35B is a flowchart illustrating details of processing at step 
S203 shown in Fig. 34. With reference to the flowchart, processing at step 
S203 will be detailed below. 

The norm normalizing section 2122 (Figs. 32 and 33) reads the 
25 frequency-normalized vectors X'(f, x) = [Xi'(f, x), X M ! (f> x)] T 

corresponding to (f, x) from memory area 2103 of the memory 2100. The 



.arg[X"'(f,x)] 



J 



4fc~*d 



.... (62) 
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norm normalizing section 2122 then calculates their norms || X'(f, t) || as 
[0196] [Formula 53] 

|x , (f,x)| = > /2:5Li(x q , (f,T)) 2 

and stores the frequency-normalized vectors X f (f, x) and their norms || X f (f, 
5 x) || in the temporary memory 2141 (step S221). 

Then, the norm normalizing section 2122 reads the 
frequency-normalized vectors X f (f, x) corresponding to each (f, x) and their 
norms || X'(f, x) || from the temporary memory 2141 and calculates 
norm-normalized vectors X"(f, x) as 
10 X"(f, x) = X'(f, x)/ 1| X'(f, x) || ... (63) 

(step S222). The calculated norm-normalized vectors X"(f, x) are stored in 
memory area 2104 of the memory 2100 and, with this, the processing at step 
S203 ends. 

[0197] The norm-normalized vectors X"(f, x) thus generated are 
15 independent of frequency and dependent only on the positions of the signal 

sources. Consequently, the norm-normalized vectors X"(f, x) form clusters. 

The reason why they form clusters will be described below. 

[Reason why norm-normalized vectors X M (f> x) form clusters] 

Because the sixth embodiment assumes the sparseness of source 
20 signals, each of the components X q (f, x) of a mixed-signal vector X(f, x) is 

proportional to (multiplied by a source signal S k (f, x) which is a complex 

scalar) the frequency response Hq k from the signal source k corresponding to 

the source signal p to a sensor q (Xq(f, x) = Hq k (f, x)-S k (f, x)). 
[0198] These source signals S k (f, x) change with discrete time (that is, with 
25 phase). Of course, if the frequency f is the same, the relative value between 
the argument of a source signal S k (f, x) observed at a sensor q and the 
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argument of the source signal S k (f, t) observed at reference sensor Q does not 
vary with discrete time. 

As described above, the first normalizing section 2121a of the 
frequency normalizing section 2121 normalizes the argument of each X q (f, t) 
5 of a mixed-signal vector X(f, x) on the basis of a particular element X Q (f, x) of 
the mixed-signal vector X(f, x) as a reference. 

[0199] In this way, uncertainty due to the phase of the source signals S k (f, 
x) is eliminated. Thus the argument of each element X q (f, x) of the 
mixed-signal vector X(f, x) that corresponds to the source signal p and sensor 
10 q is represented as a value relative to the argument of the element X Q (f, x) of 
the mixed-signal vector X(f, x) that corresponds to the source signal p and 
reference sensor Q (corresponding to reference value Q). In this case, the 

relative value corresponding to the argument of the element X Q (f, x) is 
represented as 0. 

15 The frequency response from the signal source k to the sensor q is 

approximated by using a direct-wave model without reflections and 
reverberations. Then, the argument normalize by the first normalizing 
section 2121a described above will be proportional to both of the arrival time 
difference of a wave from a signal source k to sensors and the frequency f 

20 Here, the arrival time difference is the difference between the time at which a 
wave from a signal source k reaches the sensor q and the time at which the 
wave reaches the sensor Q. 

[0200] As described above, the second normalizing section 2121b divides 
the argument of each component Xq fl, (f, x) normalized by the first normalizing 
25 section 2121a by a value proportional to frequency f. Thus, the each 
element X q m (f, x) is normalized to an element Xq f (f, x) excluding the 
dependence of the argument on frequency. Consequently, the normalized 
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elements X q (f, x) will be dependent only on the arrival time difference of the 
wave from the signal sources k to the sensors. Here, the arrival time 
difference of the wave from the signal source k to the sensors is only 
dependent on the relative positions of the signal sources k, sensors q, and 
5 reference sensor Q. Therefore, for the same signal sources k, sensors q, and 
reference sensor Q, the elements Xq'(f, t) have the same argument even if the 
frequency f differs. Thus, the frequency-normalized vector X ! (f, x) is 
independent of frequency f but is dependent only on the position of the signal 
source k. Therefore, clustering of norm-normalized vectors X"(f, x) 

10 generated by normalization of the norms of the frequency-normalized vectors 
X f (f> x) generates clusters each of which corresponds to the same signal 
source. In a real environment, the direct- wave model is not exactly satisfied 
because of the effects of reflections and reverberations. However, it 
provides a sufficiently good approximation as shown by experimental results, 

15 which will be given later. 

[0201] The reason why the norm-normalized vectors X"(f, x) form clusters 
will be described with respect to a model. 

The impulse responses h qk (r) represented by Equation (1) given 
earlier is approximated by using a direct-wave (near- field) mixture model and 
20 represented in the frequency domain, as 
[0202] [Formula 54] 

H qk (f) = ^expH27ifc- 1 (d qk - d Qk )] ... (64) 

d qk 

where d qk is the distance between a signal source k and sensor q and y(f) is a 
constant dependent on frequency. The attenuation y(f)/d qk is determined by 
25 the distance d qk and the constant y(f), and the delay (d qk - d Qk )/c is determined 
by the distance normalized by using the position of sensor Q. 
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Assuming that the signals have sparseness, the following 
relationship holds at each time-frequency (f, x). 
[0203] X q (f, x) = H^f, x>S k (f, x) ... (65) 
From Equations (62), (63), (64), and (65), it follows that 
5 [0204] [Formula 55] 



X p "(f,T) = -i-exp 

a qk U 



■ n ( d qk ~ d Qk ) 

J 2 d 



, D^jZjfiT^ .-(66) 

d ik 



As can be seen from this equation, the elements X q "(f, x) of the 
norm-normalized vector X"(f, x) are independent of the frequency f and are 
dependent only on the positions of the signal sources k and sensors q. 
10 Therefore, when norm-normalized vectors are clustered, each of the clusters 
formed corresponds to the same signal source. 

[0205] The same applies near-field and far-field mixed models that do not 
take attenuation of signals into consideration (as in the first embodiment). 

It can be seen from Equation (66) that the value of parameter d is 
15 preferably d > d max /2 (where d max represents the maximum distance between 
the reference sensor corresponding to the element X Q "(f, x) and another 
sensor), more preferably d > d max , and yet more preferably d = d max , as with 
the first embodiment. 

Figs. 37 and 38 are complex planes illustrating the relationship 
20 between an element X q "(f, x) of a norm-normalized vector X"(f, x) for each 
value of parameter d and its argument arg[Xq"(f, x)]. The horizontal axis in 
the planes represents a real axis and the vertical axis represents an imaginary 
axis. 

[0206] Fig. 37A shows a complex plane view showing the relationship 
25 when d max /2 > d. Here, from the definition of d max given above, the absolute 
value of dq k - d Qk is less than or equal to d max for any q and k. Therefore, if 
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d max /2 > d, then (7i/2)-(d qk - d Qk )/d < -n and (n/2)-(d qk - d QK )/d > n. 
Consequently, the arguments arg[X q "(f, x)] of X q "(f, t) represented by 
Equation (66) can be distributed over the range beyond 2n, that is, a x < 
argtXq" (f, t)] < a 2 (a! < -n, a 2 > n). Accordingly, the arguments of elements 
5 X q "(f, x) of different norm-normalized vectors X"(f, x) can be identical and 
therefore the different norm-normalized vectors X"(f, x) can be clustered in 
the same cluster by the clustering described above. Therefore, it is desirable 
that d > d max /2. However, if there are no samples of norm-normalized 
vectors X"(f, x) that correspond to the argument overlapping range, no 

10 problem arises even if d max /2 > d. 

[0207] Fig. 37B shows a complex plane showing the case where d max /2 < d 
< d max . In this case, the relationships -n < (7c/2>(d qk - D Qk )/ d < -n/2 and 
n/2 < (7t/2) (d qk - d Qk )/d < n are possible. Consequently, the arguments 
arg[Xq"(f, x")] of X q "(f, x) represented by Equation (66) can be distributed 
15 over the rangep^ < arg[X q " (f, x)] < p 2 (-7t < Pi < -n/2, n/2 < (3 2 < n). 

Accordingly, it is possible that the distance between elements of different 
norm-normalized vectors X"(f, x) does not monotonically increase with 
increasing difference between the arguments of elements of different 
norm-normalized vectors X"(f, x) in the ranges -n < arg[X q " (f, x)] < -n/2 and 
20 n/2 < arg[X q " (f, x)] < n. This can degrade the accuracy of the clustering 
described above. Therefore it is desirable that d > d max . 
[0208] Fig. 38A is a complex plane of the case where d = d ma x and Fig. 
38B is a complex plane of the case where d > d max . Here, if d > d max , the 
relation -n/2 < (7t/2)-(d qk -d Qk )/d < 0, 0 < {n/2)-{d^-d Q v)l d < n/2 is possible. 
25 As a result, the arguments arg[XV'(f, x)] of X q "(f, x) represented by Equation 
(66) are distributed over the rangey, < arg[X q " (f, x)] < y 2 (-n/2 < y, < 0 and 0 
y 2 < 7i/2)as shown in Fig. 38B. As the value d increases, the distribution 
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range becomes narrower and clusters are distributed more densely in the 
narrow range. This degrades the accuracy of the clustering described above. 
[0209] On the other hand, if d = d max , the relationship -n/2 < 
(7t/2)-(d qk -dQ k )/d < 0 and 0 < (7r/2)-(d qk -dQ k )/d < n/2 are possible. 
5 Consequently, the arguments arg[X q "(f> n)] of X q "(f, x) represented by 
Equation (66) are distributed over the range -n/2 < arg[X q "(f, x)] < n/2 as 
shown in Fig. 3 8 A. In this case, clusters can be distributed over a range as 
wide as possible while maintaining the relationship in which the distance 
between elements of norm-normalized vectors X"(f, x) monotonically 
10 increases as the difference between the arguments of the elements increases. 
Consequently, the accuracy of clustering can be improved in general. [End 
of the detailed description of (the processing by the frequency normalizing 
section 2121 and the norm normalizing section 2122)] 

[0210] [Details of processing by the clustering section 2123 (details of step 
15 S204)] 

As described earlier, the clustering section 2123 reads 
norm-normalized vectors X"(f, x) from memory area 2104 of the memory 
2100 and clusters them into M clusters. This clustering is performed so that 
the total sum U of the sums of squares U k of the members of the clusters (X"(f, 
20 x) g C k ) and their centroids r| k 
[0211] [Formula 56] 

U k = Sx"(f,T)GC k ll XM ( f ' T )" r lk|| 

is minimized. The minimization can be performed effectively by using the 
k-means clustering described in Non-patent literature 6, for example. The 
25 centroid (center vector)r| k of the cluster identified by cluster information C k 
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can be calculated as 
[0212] [Formula 57] 

£x"(f,T)eC k X"(f ,x)/ | C k | 
T| k = * 

Sx M (f,T)eC k X "( f > T V I C k I 

where |C k | is the number of members (norm-normalized vectors X"(f, x)) of 
5 the cluster identified by cluster information C k . While the distance used here 
is the square of the Euclidean distance, it may be the Minkowski distance, 
which is the generalized square of the Euclidean distance. [End of the 
detailed description of (the processing by the clustering section 2123)] 
[Details of processing by the separated signal generating section 2124 (details 
10 of step S205)] 

Fig. 36 is a flowchart illustrating details of processing at step S205 
shown in Fig. 34. With reference to the flowchart, details of processing at 
step S205 will be described below. 

[0213] First, the control section 2140 (Fig. 32) initializes the values of 
15 Y k (f, t) for all values of parameter k (k = 1, N) and time frequencies (f, t) 

(all f and x in a defined range) to 0 and stores them in memory area 2107 of 

the memory 2100 (step S230). 

The control section 2140 then assigns 1 to parameter k and stores it 

in the temporary memory 2141 (step S231). Then the separated signal 
20 generating section 2124 (Figs. 32 and 33) reads the cluster information C k 

from memory area 2106 of the memory 2100, extracts the members 

(norm-normalized vectors X"(f, t)) of the k-th cluster identified by the cluster 

information C k , and stores them in the temporary memory 2141 (step S232). 

The separated signal generating section 2124 then refers to the 
25 norm-normalized vectors X"(f, x) stored at step S232 in the temporary 

memory 2141, reads mixed-signal vectors X(f, x) in the time-frequencies (f, x) 
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corresponding to the norm-normalized vectors X"(f, x) from memory area 
2102 of the memory 2100, and stores them in the temporary memory 2141 
(step S233). Then, the separated signal generating section 2124 reads the 
reference value Q 1 from memory area 2105 of the memory 2100 and extracts 
(for each time-frequency (f, t)) the Q'-th element X Q (f, x) of the mixed-signal 
vector X(f, x) stored at step S233. The separated signal generating section 
2124 updates the values in memory area 2107 of the memory 2100 by setting 
the extracted element X Q -(f, x) as the k-th element Y k (f, x) of the 
separated-signal vector Y(f, x) (step S234). That is, the separated signal 
generating section 2124 in this example extracts the element Y k (f, x) as 
[0214] [Formula 5 8] 



Then, the control section 2140 determines whether the value of 
parameter k stored in the temporary memory 2141 satisfies k = N (step S235). 
If not k = N, the control section 2140 sets a calculation result k + 1 as a new 
value of parameter k, stores it in the temporary memory 2141 (step S236), and 
then returns to step S232. On the other hand, if k = N, the control section 
2140 terminates processing at step S205. [End of the detailed description of 
(processing by the separated signal generating section 2124)] 
[0215] <Experimental results> 

Results of experiments on sound source separation according to the 
sixth embodiment will be given below. In order to demonstrate the effects of 
the sixth embodiment, experiments on two types of signal separation were 
conducted. 

In a first separation experiment, two sensors are used. Conditions 
of the experiment are shown in Fig. 3 9 A. Three signal sources were used 





otherwise 
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and English speech was emitted for 6 seconds through loudspeakers. Table 
in Fig. 39B shows the results of the experiments. As shown in the Table, the 
SIRs (signal-to-interference ratios) are improved. Greater numeric values 
represent better separation performances. Shown in the table are 
5 observations by the sensors before separation (InputSIR), results obtained 
using clustering with DOA (DOA (Previous)), and results obtained using the 
sixth embodiment (clustering using normalization) (Normalized obser. vector 
(Proposed)). The results reveal that, when two sensors are used, the method 
of the sixth embodiment can achieve signal separation with performance 

10 equivalent to the performance achieved by clustering using DOA alone. 
[0216] In a second experiment, randomly arranged sensors are used. 
Experimental conditions are shown in Fig. 40A. In the experiment, four 
omnidirectional microphones (sensors) were nonlinearly arranged. 
Information indicating a maximum distance between microphones of 4 cm 

15 was all information provided to the separation system as to the arrangement of 
the sensors. Four signal sources were used to emit English speech for 6 
seconds through loudspeakers. If DOAs were used in this arrangement of 
sensors and signal sources, a complicated process would have to be performed 
in which DOA of each sensor pair is estimated, clustering is performed for 

20 each sensor pair, and then the results of clustering at all sensor pairs are 

combined. The method of the sixth embodiment can achieve high separation 
performance as shown in the table in Fig. 40B without needing such a 
complicated combining operation. Furthermore, the second embodiment 
conducted under the conditions shown in Fig. 41 A also showed high 

25 separation performance as shown in the table in Fig. 4 IB. 
[02 1 7] <Features of the sixth embodiment 

The features of the sixth embodiment are summarized below. 
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(1) Because all information obtained from mixed-signal vectors is 
used for clustering, information about all sensors can be effectively used and 
therefore the performance of signal separation is improved. 

(2) Because precise information about the positions of sensors is not 

* ■ 

5 needed, a random arrangement of sensors can be used and sensor position 
calibration is not required. 
<Variations> 

The present invention is not limited to the sixth embodiment 
described above. For example, the first normalizing section 2121a of the 

10 frequency normalizing section 2121 in the sixth embodiment normalizes the 
argument of each element X q (f, t) of a mixed-signal vector X(f, x) on the 
basis of a particular element X q (f, x) of the mixed-signal vector X(f, t) 
according to Equation (61). However, the first normalizing section 2121a of 
the frequency normalizing section 2121 may normalize the argument of each 

15 element X q (f, x) of a mixed-signal vector X(f, x) on the basis of a particular 

element X q (f, x) of the mixed-signal vector X(f, x) according to any of the 

following equations. 

[0218] [Formula 59] 

X q » ■ (f , x) =| X q (f , x) | exp { j ■ (arg[X q (f , x) • X Q * (f , x)])} 

20 X q '"(f ,x) =| X q (f ,x) | exp{j • (arg[X q (f ,x)] - arg[X Q (f ,x)])} 

X q '"(f ,x) =| X q (f ,x) | exp{j . T(arg[X q (f,x)/X Q (f,x)])} 

Here, "•*" is the complex conjugate of"-" and ,IX F{}" is a function, preferably 
a monotonically increasing function from a viewpoint of clustering accuracy. 

The frequency normalizing section 2121 may perform the frequency 
25 normalizing by using any of the following equations 
[0219] [Formula 60] 
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arg[X a (f,x)/X 0 (f,x)] 
X a '(f , x) = p — SL qV , QV n 

q 4fc _1 d 

arg[X q (f,x)-X Q *(f,x)] 
Xq (f ' T) " P ^ 

arg[X q (f,x)]-ar g[X Q (f,x)] 
Xq(f ' X) = P 

x q '(f,x)-p- y( "^ (f>T y XQ(f ' t)i) 

4fc d 

5 instead of Equation (60). Here, p is a constant (for example p = 1). 

While the norm normalizing section 2122 in the sixth embodiment 
performs normalization so that the norm has a value of 1, it may perform 
normalization so that the norm has a predetermined value other than 1 . 
Furthermore, the norm normalizing section 2122 is not provided and therefore 

10 norm normalization may be omitted. In that case, the clustering section 

2123 clusters frequency-normalized vectors X'(f, x). However, the norms of 
frequency-normalized vectors X f (f, x) are not equal. Therefore, the 
clustering is performed based on whether vectors are similar only in direction, 
rather than both in direction and norm. This means evaluation based on the 

15 degrees of similarity. One example of the measure of similarity may be 
cosine distance 

cos9 = |X' H (f, x) r, k |/( || X'(f, x) || • || r) k || ) 
where 0 is the angle between a frequency-normalized vector X ! (f, x) and the 
vector of the centroid r| k . If the cosine distance is used, the clustering 

20 section 2123 generates a cluster that minimizes the total sum of cosine 
distances 

[0220] [Formula 61] 
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Ui = Zx p ff ,T )eCi |X P ,H (f , x) • rii | /|X P ' (f , x)|| • h l) 



Here, the centroid r| k is the average among the members of each cluster. 



15 



The reference values Q and Q ! given above may or may not be 



equal. 



The same value of parameter d may be set for all sensors q or 



different values of parameter d may be set for different sensors q. For 
example, the distance between a reference sensor and a sensor q may be set as 
the value of parameter d for the sensor q. 

[0221] Furthermore, the separated signal generating section 2124 may 
10 generate, instead of 
[0222] [Formula 62] 

Yk(f;T)= {W) X"(f,T )eCk 

[0 otherwise 
the following binary mask 



[0223] [Formula 63] 

M k (f,x) = 



0 



X"(f,t)eC k 
otherwise 



and obtain the k-th element Y k (f, x) of a separated signal vector Y(f, x) as 

Y k (f, x) = M k (f, x)X Q <f, x) 

While a Fourier transformation or an inverse Fourier transformation 
is used for transformation between the frequency domain and the time domain 
20 in the embodiments described above, a wavelet transformation, DFT filter 

bank, polyphase filter bank or the like may be used for the transformation (for 
example see R. E. Crochiere, L. R. Rabiner, "Multirate Digital Signal 
Processing." Eaglewood Cliffs, NJ: Prentice-Hall, 1983 (ISBN 
0-13-605 162-6). The operations described above may be performed in time 
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* 

sequence in accordance with the description or may be performed in parallel 
or separately, depending on the throughput capacity of the apparatus that 
performs the operations. It will be understood that any other modifications 
may be made without departing from the spirit of the present invention. 
5 [0224] If any of the embodiments described above is implemented by a 
computer, operations to be performed by each apparatus are described by a 
program. The processing functions described above are implemented on the 
computer by executing the program. 

The program describing these processing operations can be 

10 recorded on a computer-readable recording medium. The computer-readable 
medium may be any medium such as a magnet recording device, an optical 
disk, magneto-optical recording medium, or a semiconductor memory. In 
particular, the magnetic recording device may be a hard disk device, a flexible 
disk, or a magnetic tape; the optical disk may be a DVD (Digital Versatile 

15 Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc 
Read Only Memory), a CD-R (Recordable/RW (Rewritable); the 
magneto-optical recording medium may be an MO (Magneto-Optical disc); 
and the semiconductor memory may be an EEP-ROM (Electronically 
Erasable and Programmable-Read Only Memory). 

20 [0225] The program may be distributed by selling, transferring, or leasing 
a removable recording medium such as a DVD or a CD-ROM, for example, 
on which the program is recorded. Alternatively, the program may be 
distributed by storing it in a storage device of a server computer beforehand 
and transmitting it from the server computer to another computer via a 

25 network. 

In an alternative embodiment, a computer may directly read the 
program from a removable recording medium and execute processing 
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according to the program, or the computer may execute processing according 
to the program each time the program is transmitted from a server to the 
computer. Alternatively, the computer may execute the processing described 
above using an ASP (Application Service Provider) service in which the 
program itself is not transmitted from a server computer to the computer, 
instead, the computer implements the processing by obtaining only 
instructions of the program and the results of execution of the instructions. 
The program in this mode includes information that is made available for 
processing by computer and is a quasi-program (such as data that are not 
direct instructions to a computer but defines processing to be performed by 
the computer). 

[0226] While a given program is executed on a computer to configure the 
present embodiments, at least part of the processing described above may be 
implemented by hardware. 

Industrial Applicability 

[0227] According to the present technique, a target signal can be 
accurately extracted in a real environment in which various interfering signals 
are generated. Examples of applications to sound signals include a speech 
separation system which functions as a front-end system of a speech 
recognition apparatus. Even in a situation where a human speaker and a 
microphone are distant from each other and therefore the microphone collects 
sounds other than the speech of the speaker, such a system can extract only 
the speech of that speaker to enable the speech to be properly recognized. 



